Implicit precedence

https://stackoverflow.com//questions/9649834

11-12-2019
|

Question

I'm reading book — "Flex and Bison" to understand how parser generators work and there is example:

calclist: /* nothing */
        | calclist exp EOL { printf("= %d\n", $1); }
        ;

exp: factor
   | exp ADD factor { $$ = $1 + $3; } 
   | exp SUB factor { $$ = $1 - $3; }
   ;

factor: term
      | factor MUL term { $$ = $1 * $3; } 
      | factor DIV term { $$ = $1 / $3; }
      ;

term: NUMBER
    | ABS term { $$ = $2 >= 0? $2 : - $2; }
    ;

and in book says that grammar above has implicit precedence by using separate nonterminal symbols. But how does it work? Assume we have following example: 1 + 3 * 2 (whitespaces just skips) we read first token 1 it will be pushed to stack as NUMBER or as term or as factor for how long it will be "bubbling" through grammar? From which grammar rule next token will be checked? Why with this grammar multiplication have higher precedence than addition?

Solution

The reason this has an "implicit" precedence (rather than explicit) is indeed just as the text says, due to the factored grammar (separate nonterminals).

Work through your example of 1 + 3 * 2, imagining yourself as the computer doing the parsing, following each instruction "to the letter" as it were. In order to find an "exp" (expression) you must first find a factor. (Your other options are to start by finding an "exp" but that has to find a "factor".) So you must find a factor ... but to do so you must find a "term" because a factor is either a term, or a factor that itself starts with a term. So now you must find a term, which is either a NUMBER or the keyword ABS. So you can "accept" (in grammar terms) the 1, which in fact is a NUMBER, and you have succeeded at the first part of the parse -- finding a term. (You now remove the 1 from the token stream, leaving you with + as the next token.)

Now that you have a term, you also have a factor (by definition), but in order to "complete the action of having a factor", as it were, you are required to try for the longer match: a factor followed by MUL or DIV, followed by something. Your next token is +: it's not a MUL and it's not a DIV. Thus, you are forced to stop parsing the factor and return the entire parse-chain-so-far as your factor: 1 is a factor, and the next token is still +.

Now that you have a factor, you have an exp (by definition), but in order to "complete the action of having an exp", you are again required to try for the longer match: an exp followed by ADD or SUB, followed by something. The next token is still + which is in fact an ADD ... so you must proceed with the exp ADD factor { $$ = $1 + $3 }; rule.

At this point, you (as the parser) push the whole thing-so-far onto a stack and get to work again looking for the appropriate nonterminal -- in this case, another "factor". So you now start at the rule for a factor. You must look for a "term", and if you find one, try to do the longer version of the rule that includes a MUL or DIV. When you work through this part, you will see that the * token is indeed a MUL and you will have to take the longer rule, making the "factor" result use the factor MUL term { $$ = $1 * $3; } part of the rule. This will accept, aka eat / use up, the 3 * 2 sequence and return the value 6 for the "factor" that lets you complete the rule you pushed onto your parse stack.

Having returned to your pushed state, you complete the parsing of "1 + " by adding 1 and accepting (eating) the complete expression. And of course 1 + 6 is 7, so that your grammar returns the correct value.

OTHER TIPS

The precedence is the result of the operands of ADD and SUB being factors, and only factors contain MUL and DIV operators. ADD does not compete with MUL, because the MUL is encapsulated in a term.

Thinking about this from the point of view of the parser: the term expression must be reduced before the parser considers its relation to the ADD operator, and that reduction obliterates the MUL operator.

Given A + X * Y, the X * Y expression is reduced to term which is a single grammar symbol that no longer expresses the * operator, and so there is nothing for the + operator to have a precedence issue against; it's now just A + term.

By the way, the grammar uses unconventional reversed terminology.

These are terms: A + B + C

These are factors: A*B*C

Terms are added ("terms of a series"), factors are multiplied ("factors of an integer or polynomial").

Is that really from this textbook? In any case, try Compilers: Principles, Techniques and Tools by Aho, Sethi, Ullman. (1988 but classic).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow