How to transform this grammar to LR(1)?

https://stackoverflow.com/questions/17386765

parsing
lr

02-06-2022
|

Question

I have a grammar with an LR(1) conflict which I cannot resolve; yet, the grammar should be unambiguous. I'll first demonstrate the problem on a simplified grammar with five tokens: (, ), {}, , and id.

The EBNF would look like this:

      args = ( id ',' )*

expression = id
           | '(' expression ')'
           | '(' args ')' '{}'

The grammar is unambiguous and requires at most two tokens of lookahead. When ( is shifted, there are only five possibilities:

( → Recur.
) → Reduce as '(' args ')'.
id ) not {} → Reduce as '(' expression ')'.
id ) {} → Reduce as '(' args ')' '{}'
id , → Reduce as '(' args ')' '{}' (eventually).

A naive translation yields the following result (and conflicts):

   formal_arg: Ident
               {}

  formal_args: formal_arg Comma formal_args
             | formal_arg
             | /* nothing */
               {}

      primary: Ident
             | LParen formal_args Curly
             | LParen primary RParen
               {}

So, the grammar requires at most three tokens of lookahead to decide. I know that an LR(3) grammar can be transformed to LR(1) grammar.

However, I don't quite understand how to do the transformation in this particular case. Note that the simplified grammar above is an extraction from a larger body of code; in particular, is it possible to transform primary without touching expr and everything above?

Solution

I provided a solution to a problem very similar to this one here: Is C#'s lambda expression grammar LALR(1)?. The basic idea was to separate out the ( id ) case from the other two possibilities ( ( expr_not_id ) and ( list_at_least_2_ids ) ). Then the decision about how to reduce ( id ) can be deferred until the lookahead token is available (in your case, {, assuming that that is sufficient).

Unfortunately, while the transformation of expr into expr_not_id is pretty straightforward and almost mechanical, it definitely touches a lot of productions. Also, it's somewhat ugly. So it fails to solve the problem you present in the last sentence. I don't actually think that it is possible to transform primary without touching expr, but I've been surprised before.

(The other obvious solution, since the grammar is in fact unambiguous, is to use a GLR parser-generator, but I don't believe the parser-generator you are using has that feature.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow