I've created a compiler for a language which has the following grammer, as defined by ML-Yacc (Starting symbol is "program" which is defined at the bottom):

%nonassoc       FUN VAR ASSIGN PLUSASSIGN MINUSASSIGN TIMESASSIGN DIVIDEASSIGN
%right          ELSE
%left           OR
%left           AND
%nonassoc       EQ NEQ GT LT GE LE
%left           PLUS MINUS
%left           TIMES DIVIDE
%left           UNARY
%left           LPAREN

%%

const: INT                                                              
     | FLOAT                                                            
     | BOOL                                                         
     | STRING                                                       

ty: ID                                                          
  | FUN LPAREN typeList RPAREN ARROW ty                         

typeList: typeList'                                                     
        |                                                               

typeList': ty COMMA typeList'                                               
         | ty                                                           

exp: primaryExp                                                     
   | callExp                                                        
   | boolExp                                                        
   | opExp                                                          
   | assignExp                                                      

assignExp: ID ASSIGN exp                                                    
         | ID PLUSASSIGN exp                                                
         | ID MINUSASSIGN exp                                           
         | ID TIMESASSIGN exp                                           
         | ID DIVIDEASSIGN exp                                          

tyargs: LT typeList' GT                                                 

callExp: exp LPAREN expList RPAREN                                      

boolExp: exp AND exp                                                        
       | exp OR exp                                                 

opExp: ID PLUSPLUS                                                      
     | ID MINUSMINUS                                                    
     | PLUSPLUS ID                                                  
     | MINUSMINUS ID                                                    
     | exp PLUS exp                                                 
     | exp MINUS exp                                                    
     | MINUS exp %prec UNARY                                            
     | BANG exp %prec UNARY                                         
     | exp TIMES exp                                                    
     | exp DIVIDE exp                                               
     | exp EQ exp                                                   
     | exp NEQ exp                                                  
     | exp GT exp                                                   
     | exp LT exp                                                   
     | exp GE exp                                                   
     | exp LE exp                                                   

expList: expList'                                                       
       |                                                                


expList': exp COMMA expList'                                                
        | exp                                                           

primaryExp: ID                                                              
          | const                                                           
          | lambdaExp                                                       
          | LPAREN exp RPAREN                                                                                                   

varDecl: ty ID ASSIGN exp                                               
       | ty ID                                                          
       | VAR ID ASSIGN exp                                              


expStat: exp SEMICOLON                                                  
       | SEMICOLON                                                      


statList: stat statList                                                 
        |                                                               

compoundStat: LBRACE statList RBRACE                                            

selectionStat: IF LPAREN exp RPAREN stat ELSE stat                              
             | IF LPAREN exp RPAREN stat                                        


jumpStat: RETURN exp                                                        
        | RETURN                                                        
        | BREAK                                                         

iterationStat: WHILE LPAREN exp RPAREN stat                                 
             | FOR LPAREN SEMICOLON SEMICOLON RPAREN stat                   
             | FOR LPAREN SEMICOLON SEMICOLON exp RPAREN stat               
             | FOR LPAREN SEMICOLON exp SEMICOLON RPAREN stat               
             | FOR LPAREN SEMICOLON exp SEMICOLON exp RPAREN stat           
             | FOR LPAREN varDecl SEMICOLON SEMICOLON RPAREN stat           
             | FOR LPAREN varDecl SEMICOLON SEMICOLON exp RPAREN stat       
             | FOR LPAREN varDecl SEMICOLON exp SEMICOLON RPAREN stat       
             | FOR LPAREN varDecl SEMICOLON exp SEMICOLON exp RPAREN stat   
             | FOR LPAREN exp SEMICOLON SEMICOLON RPAREN stat               
             | FOR LPAREN exp SEMICOLON SEMICOLON exp RPAREN stat           
             | FOR LPAREN exp SEMICOLON exp SEMICOLON RPAREN stat           
             | FOR LPAREN exp SEMICOLON exp SEMICOLON exp RPAREN stat       

stat: expStat                                                           
    | compoundStat                                                  
    | selectionStat                                                 
    | iterationStat                                                 
    | jumpStat SEMICOLON                                            
    | varDecl SEMICOLON                                                                 

declList: declList'                                                     
        |                                                               

declList': varDecl COMMA declList'                                          
         | varDecl                                                                                                          

functionDecl: FUN ID LPAREN declList RPAREN ARROW ty compoundStat           

lambdaExp: LPAREN declList RPAREN ARROW ty compoundStat                 

declarations: varDecl SEMICOLON declarations                                    
            | functionDecl declarations                                     
            |                                                               

program: declarations                                                   

and this grammar is fine, but now I want to introduce parametric polymorphism and thus added the following productions to the grammar:

tyargs: LT typeList' GT

ty: ID tyargs

callExp: exp tyargs LPAREN expList RPAREN

idList: ID COMMA idList                                                 
      | ID

tyvars: LT idList GT

functionDecl: FUN ID tyvars LPAREN declList RPAREN ARROW ty compoundStat

And now I'm getting the following 2 reduce/reduce conflicts, which I'm not sure how to solve

error:  state 75: reduce/reduce conflict between rule 46 and rule 5 on GT
error:  state 75: reduce/reduce conflict between rule 46 and rule 5 on COMMA

Can anyone tell me how I would remove these 2 conflicts?

EDIT: Here's the full .desc output from mlyacc http://pastebin.com/2w26ytuV. Not that this one also shows the 2 benign shift/reduce errors

有帮助吗?

解决方案

The problem is that with the new rules, the grammar requires arbitrary lookahead to tell the difference between a varDecl and an expStmt. This comes from LT being both a binary operator for expressions, and indicating the start of a tyargs list for a parameterized type.

One possible fix would be to introduce a new keyword to denote a parameterized type or function (like the FUN keyword currently used to introduce a function type), which would allow the parser to know in advance whether to treat a LT as an operator or a type parameter list. So you would instead add new rules like:

ty: TYPE ID tyargs

callExpr: CALL ID tyargs LPAREN expList RPAREN

Another possibility is to use lexer feedback via the symbol table -- have the lexer recognize identifiers that require type parameters (by looking up the names in the symbol table) and return a different token for them.

A third possibility is to use a stronger parser generator that can deal with more lookahead, such as bison's %glr-parser option, or btyacc

其他提示

Well, no one has answered, but you have made the grammar gloriously ambiguous. There are two possible productions for:

LT ID GT
ID LT ID GT

take some examples:

<a>
b<a>

are these to be tyargs or tyvars or the beginning of a callExp? Your grammar says they can be both. It is therefore quite difficult to parse with tools like ml-yacc without making some changes to the language or the rules you use.

You'll have trouble compiling it with ml-yacc without explaining the language structure a bit more. Someone might be able to show you a better way of structuring the grammar rules to stay within the constraints of such tools.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top