Try the first rule if the second rule fails

https://stackoverflow.com/questions/21384853

03-10-2022
|

Question

I have defined two sets of identifiers IDENTIFIER_ONE and IDENTIFIER_TWO which are both exculsive subsets of IDENTIFIER. I would like to write a parser such that:

"i1(arg) EOS" can't be parsed (1)
"i2(arg) EOS" can be parsed (2)
"i1(arg) = value EOS" can be parsed (3)
"i2(arg) = value EOS" can be parsed (4)

where i1(resp., i2) belongs to IDENTIFIER_ONE (resp., IDENTIFIER_TWO); arg and value belong to IDENTIFIER. The following parser.mly has already realized all the points I am after, except (4):

identifier:
| IDENTIFIER_ONE { $1 }
| IDENTIFIER_TWO { $1 } 

block_statement_EOS:
| identifier LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }

Given i1(arg) = value EOS as input, as goal (3) it is correctly read as BSE_Let (i1, arg, value). However, given i2(arg) = value EOS as input, it stops the parsing after reading EQUAL. I guess it is because once the parse meets i2(arg), it goes to the 2nd rule of block_statement_EOS, and later EQUAL can't be parsed.

Ideally, I would hope the parser could try the 1st rule of block_statement_EOS if the 2nd rule fails. Could anyone help me to make this possible?

PS: If I write the parser.mly as follows, all the goals can be achieved. Does anyone know why? Additionally I really don't like this workaround, because I do need to write identifier instead of two subsets in many other rules, I would hope a more elegant solution...

block_statement_EOS:
| IDENTIFIER_ONE LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }

Solution

When your parser encounters an LPAREN after an IDENTIFIER_TWO, it has to decide whether to shift or to reduce:

shift: put LPAREN on the stack;
reduce: replace IDENTIFIER_TWO, which is on top of the stack, by identifier.

If your parser chooses to shift, it will never reduce this particular IDENTIFIER_TWO into identifier (because this particular IDENTIFIER_TWO will never be on top of the stack again), meaning that it will always reduce the second rule of block_statement_EOS.

If your parser chooses to reduce, it will never reduce the second rule of block_statement_EOS, as this rule starts with IDENTIFIER_TWO and not identifier.

This is why your second version works, because there is no need to choose between shifting and reducing after IDENTIFIER_TWO. The choice is made later, if you wish.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow