Antlr left recursive

https://stackoverflow.com/questions/8101312

27-02-2021
|

Question

I'm trying to convert the postfix, infix and prefix rules from scala in EBNF form to ANTLR but am seeing an error relating to left-recursion on the infixExpression rule.

The rules in question are:

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression symbolOrID? -> ^(R__PostfixExpression infixExpression symbolOrID?)
;

public infixExpression
:   prefixExpression
|   infixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression infixExpression symbolOrID? infixExpression?)
;

public prefixExpression
:   prefixCharacter? simpleExpression -> ^(R__PrefixExpression prefixCharacter? simpleExpression)
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

If I change the infixExpression rule to:

public infixExpression
:   prefixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression prefixExpression symbolOrID? infixExpression?)
;

Then it instead complains:

warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} String" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Number" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Boolean" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Regex" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Null" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

Lastly, is there a way to conditionally create the nodes in the AST so that if only the left part of the rule is true then it doesn't add that level in? E.g.:

conditional_or_expression:
    conditional_and_expression  ('||' conditional_or_expression)?
;

As in, lets say I create the grammar which follows a hierarchy like:

conditional_and_expression
  conditional_or_expression
    null_coalescing_expression

if the expresion that is parsed is a || b, currently the AST that is created is for this expression would be

conditional_and_expression
  conditional_or_expression

How could I get it so it just gets the conditional_or_expression part?

In JavaCC, you could just set the node arity, e.g.: #ConditionalOrExpression(>1)

EDIT: it was a bit late last night, infix expression is now propery modified!

Final edit: The way I got it to work in the end were the following rules:

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression (symbolOrID^)?
;

public infixExpression
:   (prefixExpression symbolOrID)=> prefixExpression symbolOrID^ infixExpression
|   prefixExpression
;

public prefixExpression
:   prefixCharacter^ simpleExpression
|   simpleExpression
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

Solution

Darkzaelus wrote:

I'm trying to convert the postfix, infix and prefix rules from scala in EBNF form to ANTLR but am seeing an error relating to left-recursion

As I said in my comment: there's no left recursion in the rules you posted.

Darkzaelus wrote:

How could I get it so it just gets the conditional_or_expression part?

I'm assuming you're using ANTLRWorks' interpreter or debugger, in which case the tree:

conditional_and_expression
            \
  conditional_or_expression

is only being displayed like that (the parse tree is shown, not the AST). If you properly transform your orExpression into an AST, the expression a || b will become:

  ||
 /  \
a    b

(i.e. || as root, and a and b as child nodes)

For example, take the following grammar:

grammar T;

options {
  output=AST;
}

parse
  :  expr EOF -> expr
  ;

expr
  :  or_expr
  ;

or_expr
  :  and_expr ('||'^ and_expr)*
  ;

and_expr
  :  add_expr ('&&'^ add_expr)*
  ;

add_expr
  :  atom (('+' | '-')^ atom)*
  ;

atom
  :  NUMBER
  |  '(' expr ')' -> expr
  ;

NUMBER : '0'..'9'+;

If you now parse 12+34 with a parser generated from the grammar above, ANTLRWorks (or the Eclipse ANTLR IDE) will show the following parse tree:

enter image description here

but this is not the AST the parser creates. The AST actually looks like:

enter image description here

(i.e. the or_expr, and_expr "layers" are not in there)

Darkzaelus wrote:

Unfortunately, this is a fairly critical but early stage for the language, so I'm forced to keep full details of the grammar secret.

No problem, but you must realize that people can't answer your questions properly if you withhold crucial information. You don't need to post the entire grammar, but if you want help with the left-recursion, you must post a (partial) grammar that actually causes the error(s) you mention. If I can't reproduce it, it doesn't exist! :)

OTHER TIPS

This production:

infixExpr ::= PrefixExpr
            | InfixExpr id [nl] InfixExpr

Can be rewritten as

infixExpr ::= PrefixExpr
            | PrefixExpr id [nl] InfixExpr

In fact, I bet this is just an error in the grammar. Let's show an example that it is ok. Let's reduce (partially) something with the first grammar, and then try the second one.

InfixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
InfixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the (new) first InfixExpr
PrefixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

Let's reduce it with the second grammar:

PrefixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

As you see, you end with equivalent ASTs in both cases.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow