Question

I've been tasked with writing a prototype of my team's DSL in Java, so I thought I would try it out using ANTLR. However I'm having problems with the 'expression' and 'condition' rules.

The DSL is already well defined so I would like to keep as close to the current spec as possible.

grammar MyDSL;
// Obviously this is just a snippet of the whole language, but it should give a 
// decent view of the issue.

entry
    : condition EOF
    ;

condition
    : LPAREN condition RPAREN
    | atomic_condition
    | NOT condition
    | condition AND condition
    | condition OR condition
    ;

atomic_condition
    : expression compare_operator expression
    | expression (IS NULL | IS NOT NULL)
    | identifier
    | BOOLEAN
    ;

compare_operator
    : EQUALS
    | NEQUALS
    | GT | LT
    | GTEQUALS | LTEQUALS
    ;

expression
    : LPAREN expression RPAREN
    | atomic_expression
    | PREFIX expression
    | expression (MULTIPLY | DIVIDE) expression 
    | expression (ADD | SUBTRACT) expression
    | expression CONCATENATE expression
    ;

atomic_expression
    :  SUBSTR LPAREN expression COMMA expression (COMMA expression)? RPAREN
    | identifier
    | INTEGER
    ;

identifier
    : WORD
    ;

// Function Names
SUBSTR: 'SUBSTR';
// Control Chars
LPAREN : '(';
RPAREN : ')';
COMMA  : ',';
// Literals and Identifiers
fragment DIGIT : [0-9] ;
INTEGER: DIGIT+;
fragment LETTER : [A-Za-z@$#];
fragment CHARACTER : DIGIT | LETTER | '_';
WORD: LETTER CHARACTER*;
BOOLEAN: 'TRUE' | 'FALSE';
// Arithmetic Operators
MULTIPLY : '*';
DIVIDE   : '/';
ADD      : '+';
SUBTRACT : '-';
PREFIX: ADD| SUBTRACT ;
// String Operators
CONCATENATE : '||';
// Comparison Operators
EQUALS   : '==';
NEQUALS  : '<>';
GTEQUALS : '>=';
LTEQUALS : '<=';
GT       : '>';
LT       : '<';
// Logical Operators
NOT : 'NOT';
AND : 'AND';
OR  : 'OR';
// Keywords
IS  : 'IS';
NULL: 'NULL';
// Whitespace
BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;

The phrase I'm testing with is

(FOO == 115 AND (SUBSTR(BAR,2,1) == 1 OR SUBSTR(BAR,4,1) == 1))

However it is breaking on the nested parenthesis, matching the first ( with the first ) instead of the outermost (see below). In ANTLR3 I solved this with semantic predicates but it seems that ANTLR4 is supposed to have fixed the need for those.

ANTLRWorks Parse tree

I'd really like to keep the condition and the expression rules separate if at all possible. I have been able to get it to work when merged together in a single expression rule (based on examples here and elsewhere) but the current DSL spec has them as different and I'm trying to reduce any possible differences in behaviour.
Can anyone point out how I can get this all working while maintaining a separate rule for conditions' andexpressions`? Many thanks!

Was it helpful?

Solution

The grammar seems fine to me.

There's one thing going wrong in the lexer: the WORD token is defined before various keywords/operators causing it to get precedence over them. Place your WORD rule at the very end of your lexer rules (or at least after the last keywords which WORD could also match).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top