Precedenza implicita

https://stackoverflow.com//questions/9649834

11-12-2019
|

Domanda

Sto leggendo il libro - "Flex e Bison" per capire come funzionano i generatori del parser e c'è un esempio:

calclist: /* nothing */
        | calclist exp EOL { printf("= %d\n", $1); }
        ;

exp: factor
   | exp ADD factor { $$ = $1 + $3; } 
   | exp SUB factor { $$ = $1 - $3; }
   ;

factor: term
      | factor MUL term { $$ = $1 * $3; } 
      | factor DIV term { $$ = $1 / $3; }
      ;

term: NUMBER
    | ABS term { $$ = $2 >= 0? $2 : - $2; }
    ;

E nel libro afferma che la grammatica sopra ha una precedenza implicita utilizzando simboli non più moderni separati.Ma come funziona?Supponiamo che abbiamo un esempio seguente: 1 + 3 * 2 (spazi di spazi bianchi) Leggiamo il primo token 1, verrà premuto per impilare come NUMBER o come term o come factor per quanto tempo sarà "gorgogliante" attraverso la grammatica?Da quale regola grammaticale successivamente il token sarà controllato?Perché con questa moltiplicazione grammaticale ha una precedenza più alta dell'aggiunta?

Soluzione

The reason this has an "implicit" precedence (rather than explicit) is indeed just as the text says, due to the factored grammar (separate nonterminals).

Work through your example of 1 + 3 * 2, imagining yourself as the computer doing the parsing, following each instruction "to the letter" as it were. In order to find an "exp" (expression) you must first find a factor. (Your other options are to start by finding an "exp" but that has to find a "factor".) So you must find a factor ... but to do so you must find a "term" because a factor is either a term, or a factor that itself starts with a term. So now you must find a term, which is either a NUMBER or the keyword ABS. So you can "accept" (in grammar terms) the 1, which in fact is a NUMBER, and you have succeeded at the first part of the parse -- finding a term. (You now remove the 1 from the token stream, leaving you with + as the next token.)

Now that you have a term, you also have a factor (by definition), but in order to "complete the action of having a factor", as it were, you are required to try for the longer match: a factor followed by MUL or DIV, followed by something. Your next token is +: it's not a MUL and it's not a DIV. Thus, you are forced to stop parsing the factor and return the entire parse-chain-so-far as your factor: 1 is a factor, and the next token is still +.

Now that you have a factor, you have an exp (by definition), but in order to "complete the action of having an exp", you are again required to try for the longer match: an exp followed by ADD or SUB, followed by something. The next token is still + which is in fact an ADD ... so you must proceed with the exp ADD factor { $$ = $1 + $3 }; rule.

At this point, you (as the parser) push the whole thing-so-far onto a stack and get to work again looking for the appropriate nonterminal -- in this case, another "factor". So you now start at the rule for a factor. You must look for a "term", and if you find one, try to do the longer version of the rule that includes a MUL or DIV. When you work through this part, you will see that the * token is indeed a MUL and you will have to take the longer rule, making the "factor" result use the factor MUL term { $$ = $1 * $3; } part of the rule. This will accept, aka eat / use up, the 3 * 2 sequence and return the value 6 for the "factor" that lets you complete the rule you pushed onto your parse stack.

Having returned to your pushed state, you complete the parsing of "1 + " by adding 1 and accepting (eating) the complete expression. And of course 1 + 6 is 7, so that your grammar returns the correct value.

Altri suggerimenti

The precedence is the result of the operands of ADD and SUB being factors, and only factors contain MUL and DIV operators. ADD does not compete with MUL, because the MUL is encapsulated in a term.

Thinking about this from the point of view of the parser: the term expression must be reduced before the parser considers its relation to the ADD operator, and that reduction obliterates the MUL operator.

Given A + X * Y, the X * Y expression is reduced to term which is a single grammar symbol that no longer expresses the * operator, and so there is nothing for the + operator to have a precedence issue against; it's now just A + term.

By the way, the grammar uses unconventional reversed terminology.

These are terms: A + B + C

These are factors: A*B*C

Terms are added ("terms of a series"), factors are multiplied ("factors of an integer or polynomial").

Is that really from this textbook? In any case, try Compilers: Principles, Techniques and Tools by Aho, Sethi, Ullman. (1988 but classic).

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow