Question

I have the following bison grammar (as part of a more complex grammar):

classDeclaration : CLASS ID EXTENDS ID LBRACE variableDeclarationList methodDeclarationList RBRACE
                 ;

variableDeclarationList : variableDeclarationList variableDeclaration 
                        | /* empty */
                        ;

variableDeclaration : type ID SEMICOLON
                    ;

type : NATTYPE | ID
     ;

methodDeclarationList : methodDeclarationList methodDeclaration
                      | /* empty */
                      ;

methodDeclaration : type ID LPAREN parameterDeclarationList RPAREN variableExpressionBlock
          ;

which is supposed to describe class declarations which look like this:

class foo extends object
{
    nat number;

    nat divide(nat aNumber)
    {
        0;
    }
}

or this:

class foo extends object
{
    nat divide(nat aNumber)
    {
        0;
    }
}

or this:

class foo extends object
{
}

Problem is that there is ambiguity where variable declarations end and method declarations begin (2 shift/reduce conflicts). For example, the method declaration looks like a variable declaration until it sees the parenthesis.

How can I rewrite this grammar to eliminate this ambiguity?

To clarify: the class body can be empty, the only constraint is that variable declarations come before method declarations if there are any.

Was it helpful?

Solution 2

Another way to do it is to not even have empty rules, and instead use multiple options, one with the nonterminal and one without.

classDeclaration : CLASS ID EXTENDS ID LBRACE RBRACE
                 | CLASS ID EXTENDS ID LBRACE methodDeclarationList RBRACE
                 | CLASS ID EXTENDS ID LBRACE variableDeclarationList RBRACE
                 | CLASS ID EXTENDS ID LBRACE variableDeclarationList methodDeclarationList RBRACE           
                 ;

variableDeclarationList : variableDeclaration 
                        | variableDeclarationList variableDeclaration
                        ;

variableDeclaration : type ID SEMICOLON
                    ;

type : NATTYPE 
     | ID
     ;

methodDeclarationList : methodDeclaration 
                      | methodDeclarationList methodDeclaration
                      ;

methodDeclaration :  type ID LPAREN RPAREN variableExpressionBlock
                  |  type ID LPAREN parameterList RPAREN variableExpressionBlock
                  ;

OTHER TIPS

This isn't an ambiguity, its a lookahead problem. The problem is that you need 3 tokens of lookahead (up to the SEMICOLON or LPAREN of the next declaration) for the parser to figure out where the end of the variableDeclarationList is, as it needs to reduce an empty methodDeclarationList before it starts parsing more methodDeclarations.

The way to fix this is to remove the need for an empty reduction at the start of a method declaration list:

methodDeclarationList : nonEmptyMethodDeclarationList | /*empty */ ;

nonEmptyMethodDeclarationList : nonEmptyMethodDeclarationList methodDeclaration
                              | methodDeclaration
                              ;

With this, the parser does not need to reduce an empty methodDeclarationList UNLESS there are no methods at all -- and in that case, only one token of lookahead is needed to see the RBRACE

I am not familiar with bison, but have you tried making a rule for the common prefix of both rules? the "type ID" is present in both the variable and method patterns.

So if you had say:

typedId : type ID
        ; 

and then

variableDeclaration : typedId SEMICOLON
                    ;

methodDeclaration   : typedId LPAREN parameterDeclarationList RPAREN variableExpressionBlock
                    ;

this way the variable rule and method rule would not be looked at untill the common prefix is already pushed, and the next token whould be unambiguous.

I have not played with this sort of thing in years I hope this helps.

This slightly modified grammar works:

%token CLASS EXTENDS ID LBRACE RBRACE SEMICOLON NATTYPE LPAREN RPAREN DIGIT COMMA 
%%
classDeclaration : CLASS ID EXTENDS ID LBRACE declarationList RBRACE
        ;

declarationList : /* Empty */
        | declarationList declaration
        ;

declaration :   variableDeclaration
        |       methodDeclaration
        ;

variableDeclaration : parameterDeclaration SEMICOLON
        ;

type : NATTYPE | ID
        ;

methodDeclaration : parameterDeclaration LPAREN parameterDeclarationList RPAREN
                    variableExpressionBlock
        ;

variableExpressionBlock : LBRACE DIGIT RBRACE
        ;

parameterDeclarationList : /* empty */
        |   parameterDeclarationList COMMA parameterDeclaration
        ;

parameterDeclaration : type ID
        ;

You probably want to rename the non-terminal 'parameterDeclaration' to something like 'singleVariableDeclaration', but by avoiding having two possibly empty rules in a row (the original 'variableDeclarationList' and 'methodDeclarationList', you avoid the ambiguity.

This does allow, syntactically, methods interleaved with variables in the class's declarationList. If that isn't acceptable for some reason, consider making that a semantic error rather than a syntactic error. If it must be a syntax error, then someone is going to have to do some thinking; I vote to make you do the thinking.


If you insist on at least one method declaration, then the grammar is unambiguous:

methodDeclarationList : methodDeclarationList methodDeclaration
        | methodDeclaration /* empty */
        ;

If you try the same with a variable declaration list, the grammar still has two S/R conflicts.

One possibility, not to be completely ignored, is to use the Bison feature, %expect 2, to indicate that 2 shift/reduce conflicts are expected.

%expect 2
%token CLASS EXTENDS ID LBRACE RBRACE SEMICOLON NATTYPE LPAREN RPAREN DIGIT COMMA
%%
classDeclaration : CLASS ID EXTENDS ID LBRACE variableDeclarationList methodDeclarationList RBRACE
        ;

variableDeclarationList : variableDeclarationList variableDeclaration 
        | /* empty */
        ;

variableDeclaration : singleVariableDeclaration SEMICOLON
        ;

type : NATTYPE | ID
        ;

methodDeclarationList : methodDeclarationList methodDeclaration
        | /* empty */
        ;

methodDeclaration : singleVariableDeclaration LPAREN parameterDeclarationList RPAREN variableExpressionBlock
        ;

variableExpressionBlock : LBRACE DIGIT RBRACE
        ;

parameterDeclarationList : /* empty */
        |   parameterDeclarationList COMMA parameterDeclaration
        ;

parameterDeclaration : singleVariableDeclaration
        ;

singleVariableDeclaration : type ID
        ;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top