happy: reduce/reduce conflict

https://stackoverflow.com/questions/10765307

11-06-2021
|

题

Why this raises a warning about reduce/reduce conflict

root : set1 'X'     
     | set2 'X' 'X' 

set1 : 'A'          
     | 'B'             

set2 : 'B'          
     | 'C'

but the next is ok?

root : 'A' 'X'     
     | 'B' 'X'     
     | 'B' 'X' 'X'  
     | 'C' 'X' 'X'

解决方案

The difference is because, in the first case, the parser has to make a choice about the reduction before it gets to see whether one 'X' or two will follow.

In the second, the parser can use the same state, let's call it BX, when it's seen a B and an X—both shifted—and then depending on the next token, can shift (if an X), and then reduce the 'B' 'X' 'X' rule, or otherwise reduce 'B' 'X' immediately.

Notice that if they didn't have identical tokens immediately following—e.g. you had set1 'X' but set2 'Y'—then there would be no problem, because the lookahead would be able to kick in and pick which reduction to take.

Here are the relevant sections from the output from bison -v for exposing this issue:

Case one

state 0

$accept: . root $end

'A'  shift, and go to state 1
'B'  shift, and go to state 2
'C'  shift, and go to state 3

root  go to state 4
set1  go to state 5
set2  go to state 6

Assuming we get a 'B', we go to state 2:

state 2

set1: 'B' .
set2: 'B' .

'X'       reduce using rule 4 (set1)
'X'       [reduce using rule 5 (set2)]
$default  reduce using rule 4 (set1)

Note that there are two possible reductions we can make: to set1 or set2, both with the same input token. Hence a reduce/reduce; we only have one token of lookahead, and with this grammar, the only token could be a 'X'—in either case!

Case two

state 0

$accept: . root $end

'A'  shift, and go to state 1
'B'  shift, and go to state 2
'C'  shift, and go to state 3

root  go to state 4

Assuming we get a 'B', we go to state 2:

state 2

root: 'B' . 'X'
    | 'B' . 'X' 'X'

'X'  shift, and go to state 6

While we only have one token of lookahead, the parser generator can produce a state where it's seen 'B' 'X' due to the accommodating structure of the input. Hence we go to state 6 in any event (or error otherwise ;-)):

state 6

root: 'B' 'X' .
    | 'B' 'X' . 'X'

'X'  shift, and go to state 9

$default  reduce using rule 2 (root)

Here's where the magic happens: if we see an 'X', we shift and go to state 9 (where we reduce), otherwise we reduce 'B' 'X' immediately.

For completeness, here's state 9:

state 9

root: 'B' 'X' 'X' .

$default  reduce using rule 3 (root)

If we had a token of lookahead to disambiguate

With this sample grammar:

root: set1 'X'
    | set2 'Y'

set1: 'A'          
    | 'B'             

set2: 'B'          
    | 'C'

Then, we start:

state 0

$accept: . root $end

'A'  shift, and go to state 1
'B'  shift, and go to state 2
'C'  shift, and go to state 3

root  go to state 4
set1  go to state 5
set2  go to state 6

We shift 'B' and go to state 2:

state 2

set1: 'B' .
set2: 'B' .

'Y'       reduce using rule 5 (set2)
$default  reduce using rule 4 (set1)

So, this state is reached in both rules for set1 and set2, where we have a single 'B' token on the stack. In this case, if we next see a 'Y', we reduce to set2—or in any other case, reduce to set1.

The fact that it's picked set1 as the "default" reduction could have implications for error handling.

Appendix on GLR

Happy (and bison or yacc) produce LALR(1) parsers by default, though you can produce a GLR parser with --glr (or %glr-parser in your bison declarations file). This can resolve ambiguities by concurrently trying both "possibilities"; the parser will fork and see how far it gets in either case.

This is probably unwise unless you really need it, know you need it, and know what could happen if things go wrong, though. I'm not sure what happens if both forks terminate successfully; with my non-scientific testing, it seemed to always end up picking the longer parse.

Lexer hacks

If you don't want to use GLR, but don't want to restructure your parser significantly, you could consider using a lexer hack to overcome this issue.

Right now, you have this:

root : set1 'X'     
     | set2 'X' 'X'

You could instead emit a token for a single 'X' character, and a different token for two:

root : set1 ONE_X
     | set2 TWO_XS

This resolves ambiguity within a single token, and is an unambiguous LALR(1) grammar as a result.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow