Question

I inherited an ANTLR grammar and now implement it without any change with Python Lex Yacc. My Problem is, that ANTLR normally uses very high level EBNF to define a grammar, whereas Yacc uses simple context free grammar.

I struggle with the AST definition of binary expressions. Given from ANTLR, they look like

disjunction
: conjunction ('||' conjunction)*

This definition exists for precedence reasons, do not wonder about the names.

In my CFG, it looks like

def p_sum(self, p):
""" sum : product product_list """
???

def p_product_list(self, p):
""" product_list : PLUS product product_list
| MINUS product product_list
| epsilon
"""
???

Some other expressions look like

def p_disjunction_1(self, p):
""" disjunction : conjunction """
    p[0] = p[1]

def p_disjunction_2(self, p):
""" disjunction : conjunction OR disjunction """
    p[0] = BinaryOp(p[2], p[1], p[3], p[1].coord)

What I want is that my AST consists only of BinaryExpression Nodes, as given in the second example:

BinaryExpression('+', lhs, rhs)

But this grammar gives me a list of factors. Is there any way I can write it in my way? The ??? are the places where I need to but an expression to build my AST, but I cannot think of a nice way to do it. What I see is to define a node with a list of unary expressions, but that is ugly for me. Or is there another way to write the grammar? I do not dare to touch it, since I fear I will alter it in a way which parses different stuff.

Was it helpful?

Solution

The "original" form of the disjunction production (before it had left-recursion eliminated to make it appropriate for LL parsing) was

disjunction: conjunction
           | disjunction '|' conjunction
           ;

That will left-associate disjunction; if you want right-association -- for example for assignment operators -- use right-recursion:

assignment: expression
          | expression '=' assignment
          ;

(That might not be precisely what you want on the left-hand side of an assignment operator, but it shows a general pattern for right-associative rules.)

Similarly:

sum: product
   | sum '+' product
   | sum '-' product
   ;

The reconstruction of the grammar corresponding to a sensible parse tree should be fairly mechanical, but you probably should write enough test cases, parsing them with both grammars, to assure yourself that you got it correct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top