Bison shift/reduce conflict in A ::= AA rule
-
28-04-2021 - |
Question
I've written following bison grammar file:
%left '+' '-'
%left '*' '/'
%token NUMBER
%%
expr
: NUMBER
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr expr %prec '*' /* implicit multiplication */
;
Now bison
reports shift/reduce conflict regarding expr : expr expr
. I've extracted problem to following minimal set:
%left OP
%%
expr
: 'a'
| expr expr %prec OP
;
I can't get why is bison
still complains about shift/reduce conflict. I've found some old mail archive: Re: bison/yacc: shift/reduce conflict using %prec for composition, but I don't understand author's explanations either.
Could someone clarify why this grammar is ambiguous, and how to resolve conflict?
EDIT: by NUMBER NUMBER
I mean NUMBER * NUMBER
, i.e. product of this two numbers.
Solution
The problem here is that the token NUMBER
doesn't have a precedence. So when there's a state that can either shift a NUMBER
or reduce a rule (regardless of whether that rule has a precedence) it can't decide which to do.
Now you can fix it for this grammar by adding a precedence for NUMBER
(make it the same as *
), but it will come back if you add any rules for expression that start with any token other than NUMBER
-- for example if you add expr: '(' expr ')'
, you'll get shift/reduce conflicts on '('
.
An even bigger problem is if you add unary prefix operators, such as expr: '-' expr
. In this case, you won't get a conflict as '-' already has a precedence, but an input like NUMBER - NUMBER
will get parsed as NUMBER ( - NUMBER )
, which is probably not at all what you want. There's no good way of dealing with this issue with precedence rules.
The underlying cause of this confusion is that bison's precedence rules don't resolve precedence by comparing two rules as you might naively expect based on using the word "precedence". Instead they work by comparing the "precedence" of a rule to be reduced with that of a token to be shifted and base the decision to shift or reduce based on that. At the point in the parse where this happens, the second rule hasn't yet been recognized; instead bison is just guessing what it might be based on the token.
OTHER TIPS
Part of the answer is to look hard at the output file from bison -v
. For your first grammar, I got these extracts:
State 8 conflicts: 1 shift/reduce
State 9 conflicts: 1 shift/reduce
State 10 conflicts: 1 shift/reduce
State 11 conflicts: 1 shift/reduce
State 12 conflicts: 1 shift/reduce
Grammar
0 $accept: expr $end
1 expr: NUMBER
2 | expr '+' expr
3 | expr '-' expr
4 | expr '*' expr
5 | expr '/' expr
6 | expr expr
So there are 5 shift/reduce conflicts in the grammar. These are the less serious type of conflict; you can state that you expect them with %expect 5
in the grammar, if you're convinced that what the grammar is doing is correct.
state 0
0 $accept: . expr $end
NUMBER shift, and go to state 1
expr go to state 2
state 1
1 expr: NUMBER .
$default reduce using rule 1 (expr)
state 2
0 $accept: expr . $end
2 expr: expr . '+' expr
3 | expr . '-' expr
4 | expr . '*' expr
5 | expr . '/' expr
6 | expr . expr
$end shift, and go to state 3
'+' shift, and go to state 4
'-' shift, and go to state 5
'*' shift, and go to state 6
'/' shift, and go to state 7
NUMBER shift, and go to state 1
expr go to state 8
state 3
0 $accept: expr $end .
$default accept
state 4
2 expr: expr '+' . expr
NUMBER shift, and go to state 1
expr go to state 9
States 5, 6, 7 mimic state 4 but for the other operators. State 8 is the first of the states with a shift/reduce conflict. Remember the .
(dot) in the rules indicates where the parser is when it reaches this state.
state 8
2 expr: expr . '+' expr
3 | expr . '-' expr
4 | expr . '*' expr
5 | expr . '/' expr
6 | expr . expr
6 | expr expr .
NUMBER shift, and go to state 1
NUMBER [reduce using rule 6 (expr)]
$default reduce using rule 6 (expr)
expr go to state 8
state 9
2 expr: expr . '+' expr
2 | expr '+' expr .
3 | expr . '-' expr
4 | expr . '*' expr
5 | expr . '/' expr
6 | expr . expr
'*' shift, and go to state 6
'/' shift, and go to state 7
NUMBER shift, and go to state 1
NUMBER [reduce using rule 2 (expr)]
$default reduce using rule 2 (expr)
expr go to state 8
There are differences and similarities between these two states, but states 10, 11, 12 match state 9 except for the different point of ambiguity.
The trouble is that when the grammar sees:
NUMBER OP NUMBER NUMBER
it cannot tell whether to parse that as:
( NUMBER OP NUMBER ) NUMBER expr expr
or as:
NUMBER OP ( NUMBER NUMBER )
expr OP expr
Given that it is a shift/reduce conflict in each case, it chooses to shift. If that's what you want, then add the %expect 5
and get on with life. If that's not what you want, then you need to rethink your grammar. What does a pair of adjacent numbers indicate, and are you sure you don't need some operator character (perhaps a comma or colon) to separate them?
I tried raising the precedence of the missing operator by using:
%left MISSING
after the other precedence declarations, and then using:
expr expr %prec MISSING
This did not change anything. Neither does making the precedence of MISSING very low by listing it before the other operators.
You get an inkling of the problems if you consider how an expression like this should be parsed:
NUMBER OP NUMBER NUMBER NUMBER OP NUMBER NUMBER OP NUMBER
Where the OP is the same in each appearance. My brain is hurting! So is bison
's!