Question

I'm trying to read Compiler Construction by Niklaus Wirth. On page 23 he starts to describe how LALR would parse the expression x*(y+z) given the following grammar:

E  = T  | E "+" T. expression 
T  = F  | T "*" F. term 
F  = id | "(" E ")". factor

He goes on to show the reduction as:

     Action   Stack     Remaining
1             x         * (y + z) 
2    S        x         * (y + z) 
3    R        F         * (y + z) 
4    R        T         * (y + z) 
6    S        T*          (y + z) 
7    S        T*(          y + z) 
8    S        T*(y           + z) 
9    R        T*(F           + z) 
10   R        T*(T           + z) 
11   R        T*(E           + z) 
12   S        T*(E+            z) 
13   S        T*(E + z          ) 
14   R        T*(E + F          ) 
15   R        T*(E + T          ) 
16   R        T*(E              ) 
17   S        T*(E) 
18   R        T*F 
19   R        T 
20   R        E

Where the action is S (for shift) or R (for reduction ... I added the line numbers for clarity). So I think I understand how to go from steps 1 to 4 and from 4 to 20, but I don't understand 4 itself. For example, step 1 pushes x onto the stack. x represents the RHS of rule 'F', so a reduction occurs -> F. F represents the 1st "OR" of rule 'T' and so another reduction can occur -> T. If that is right (which I'm not sure it is?), then why wouldn't T also be replaced with E as T represents the first "OR" of the RHS of rule 'E'. Is it because rule E has an implied "EOF" so to speak (and since we haven't reached EOF it can't reduce)? Or is it because it's ambiguous at this point (T also represents the 1st part of the 2nd "OR" of the RHS of rule T ... ie, T "*" F)? Or is it something else entirely?

Thanks!

Was it helpful?

Solution

The parser uses two criteria to decide what action (shift or reduce) to take next. The first is when the tokens on the stack match the right hand side of a production. After step 4, the T on the stack matches the E = T production, so if that were the only criterion, it could reduce at that point. However, the parser is also looking at the lookahead (ie, first token in "remaining") to decide what action to take. There's no rule which has E "*" as a prefix, so that reduction is not valid, and the only action is a shift. After step 20, the E = T production is valid, because (as you guessed) the lookahead token there is effectively an EOF, which does match.

Note that some ambiguous grammars can result in both a shift and a reduce action being valid. In these cases, Bison resolves in favor of "shift". See the Bison docs for more detail. However, the grammar given above is not ambiguous; one token of lookahead is enough to make it unambiguous.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top