ANTLR Grammar for expressions
-
12-09-2019 - |
Question
I'm trying to implement a expression handling grammar (that deals with nested parenthesis and stuff). I have the following so far, but they can't deal with some cases (successful/failure cases appear after the following code block). Anyone know what's going on?
Note: The varname += and varname = stuff are just some additional AST generation helper stuff in XText. Don't worry about them for now.
...
NilExpression returns Expression:
'nil';
FalseExpression returns Expression:
'false';
TrueExpression returns Expression:
'true';
NumberExpression returns Expression:
value=Number;
StringExpression returns Expression:
value=STRING; //EllipsesExpression: '...';
//FunctionExpression: function=function; //don't allow random functions
UnaryExpression:
op=unop ('(' expr=Expression ')')|expr=Expression;
BinaryExpression:
'or'? AndOp; //or op
AndOp:
'and'? ComparisonOp;
ComparisonOp:
('>'|'<'|'>='|'<='|'=='|'~=')? ConcatOp;
ConcatOp:
'..'? AddSubOp;
AddSubOp:
('+' '-')? MultDivOp;
MultDivOp:
('*' '/')? ExpOp;
ExpOp:
'^'? (('(' expr=Expression ')')|expr=Expression);
ExprSideOne : Variable|NilExpression|FalseExpression|TrueExpression|
NumberExpression|StringExpression|UnaryExpression;
Expression:
(
'('
expression1=ExprSideOne expression2+=BinaryExpression*
')'
)
|
( expression1=ExprSideOne expression2+=BinaryExpression* )
;
...
And here's the list of parses/fails:
c = ((b)); //fails
c = ((a not b)); //fails
c = b; //parses
d = (b); //parses
Solution
What's going on is that your Expression/Expressions support single parentheses but not multiple parentheses (as you concluded). I don't have ANTLR specific experience but I've worked with Javacc which shares many similar concepts (I wrote a grammar for Prolog... don't ask).
To handle nested parentheses, you typically have something similar to:
ParenthesisExpression: '(' (ParenthesisExpression | Expression) ')';
This would mean that the expression is either wrapped in parentheses or it's just a raw expression. As for how the AST deals with this, a ParenthesisExpression 'is a' Expression, so it can be represented as a subclass or an implementation of (if Expression is an interface/abstract class of sorts).
OTHER TIPS
Make use of the ^
placed after the token/rule name is very useful for defining expressions.
expression : e1 (OR^ e1)* ;
e1 : e2 (AND^ e2)*;
e2 : e3 (PIPE^ e3)*;
e3 : e4 (ANDSYMB^ e4)*;
e4 : e5 ((EQUAL^|NOTEQUAL^) e5)*;
e5 : e6 ((LESS^|GREATER^) e6)*;
e6 : e7 ((PLUS^|MINUS^) e7)* ;
e7 : e8 ((STAR^|SLASH^) e8)* ;
e8 : e9 (NEW^ ID LPAREN RPAREN)*;
e9 : (NOT^)? e10;
e10 : e11 | call_def;
e11 : constant
| '(' expression ')' -> expression;
I've used this grammar for simple expressions: http://fisheye2.atlassian.com/browse/~raw,r=5175/antlr-examples/C/polydiff/Poly.g
I've also used the grammar contain in this project for more complex expressions: http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx