Question

I have to write some sort of parser and it's quite easy with tools like yacc and bison. But I have the following question:

NotRec: /*empty*/
| NotRec T_NOT
| NotRec T_MINUS
;

Expr:
| Term /*1*/
| NotRec Term /*2*/
;

What is the difference between rule 1 and 2?

In my opinion NotRec can be empty (because it has an empty branch) and therefore Term should be the same as NotRec Term. But if I remove the first rule I get different results!

Was it helpful?

Solution

As written, the grammar is ambiguous, as NotRec will match 0 or more T_NOT or T_MINUS tokens, so if you have an Expr which has no such tokens before the Term, it can be matched by either rule 1 or rule 2.

If you remove the NotRec: /*empty*/ rule, then NotRec becomes useless as it won't match any finite sequence of tokens. This changes the language, removing any finite strings of T_NOT/T_MINUS.

If you remove the Expr: Term rule, that gets rid of the ambiguity without changing the langauge.

If you use this grammar as-is with yacc or bison, you'll get a shift/reduce conflict due to the ambiguity. The default conflict resolution of using the shift will resolve the ambiguity without changing the language -- as long as there are no other conflicting uses of those tokens in the part of the grammar you left out. It will use rule 1 for any Expr with no T_NOT/T_MINUS instructions and rule 2 for any Expr with one or more such tokens. This is equivalent to changing the NotRec rules to

NotRec: T_NOT
      | T_MINUS
      | NotRec T_NOT
      | NotRec T_MINUS
      ;

This makes NotRec match one or more tokens instead of zero or more.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top