Question

I'm working on a parser and I'm really frustrated. In the language, we can have an expression like:

new int[3][][]

or

new int[3]

Most of it parses correctly, except for the empty arrays at the end. In my parser I have:

Expression : int
             char
             null
             (...many others...)
             new NewExpression

and then a NewExpression is:

NewExpression : NonArrayType '[' Expression ']' EmptyArrays
              | NonArrayType '[' Expression ']' 

and then EmptyArrays is one or more empty braces - if EmptyArrays derives the empty string, it adds 20 shift/reduce conflicts:

EmptyArrays : EmptyArrays EmptyArray
            | EmptyArray
EmptyArray  : '[' ']'

However, when I look in the .info file for the parser, I get this:

State 214¬
¬
▸   NewExpression -> NonArrayType lbrace Expression rbrace . EmptyArrays    (rule 80)¬
▸   NewExpression -> NonArrayType lbrace Expression rbrace .    (rule 81)¬
¬
▸   dot            reduce using rule 81¬
▸   ';'            reduce using rule 81¬
▸   ','            reduce using rule 81¬
▸   '+'            reduce using rule 81¬
▸   '-'            reduce using rule 81¬
▸   '*'            reduce using rule 81¬
▸   '/'            reduce using rule 81¬
▸   '<'            reduce using rule 81¬
▸   '>'            reduce using rule 81¬
▸   '<='           reduce using rule 81¬
▸   '>='           reduce using rule 81¬
▸   '=='           reduce using rule 81¬
▸   '!='           reduce using rule 81¬
▸   ')'            reduce using rule 81¬
▸   '['            reduce using rule 81    --I expect this should shift
▸   ']'            reduce using rule 81¬
▸   '?'            reduce using rule 81¬
▸   ':'            reduce using rule 81¬
▸   '&&'           reduce using rule 81¬
▸   '||'           reduce using rule 81

I expect though that if we're in state 214 and we see a left brace, we should shift it onto the stack and continue to parse EmptyArrays.

I'm not exactly sure what is going on because when I strip all of the excess out of the baggage (eg) by starting the parse with NewExpression, the additional brackets parse correctly. It's not possible for an Expression or a Statement or any non-terminal in the grammar to start with a left brace. Especially because I have a similar rule for if/else statements, which generates a shift/reduce conflict, but chooses to shift if the next token is an else (this problem is well documented).

Can you help me figure out what is going wrong? I really appreciate your help, I am really tilting at windmills trying to figure out the problem.

Was it helpful?

Solution

You probably have a precedence set for '[' and/or ']' with something like %left '[' which causes this behavior. Remove that precedence declaration, and this will reveal the shift/reduce conflict you have here. As for why its a shift/reduce conflict, you probably also have a rule:

Expression: Expression '[' Expression ']'

for an array access. The problem being that since a NewExpression is an Expression it may be followed by an index like this, and when looking at the lookahead of '[', it can't tell whether that's the beginning of an index expression or the beginning of an EmptyArray -- that would require 2-token lookahead.

One thing you could try for this specific case would be to have your lexer do the extra lookahead needed here and recognize [] as a single token.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top