Why parser splits command name into different nodes

https://stackoverflow.com/questions/19132114

antlr
antlr3

30-06-2022
|

Question

I have the statement:

=MYFUNCTION_NAME(1,2,3)

My grammar is:

grammar Expression;
options
{  
    language=CSharp3;
    output=AST;
    backtrack=true;
} 
tokens 
{
  FUNC;
  PARAMS;
}
@parser::namespace { Expression }
@lexer::namespace  { Expression }

public 
parse     :   ('=' func )*;
func      :  funcId  '(' formalPar* ')' -> ^(FUNC funcId formalPar);
formalPar :  (par ',')* par  -> ^(PARAMS par+);
par       :  INT;
funcId    :  complexId+ ('_'? complexId+)*;
complexId  
  : ID+
  | ID+DIGIT+      ;
ID        :  ('a'..'z'|'A'..'Z'|'а'..'я'|'А'..'Я')+;
DIGIT     : ('0'..'9')+;
INT       : '-'? ('0'..'9')+;

In a tree i get:

        [**FUNC**]
             |
 [MYFUNCTION] [_] [NAME] [**PARAMS**]

Why the parser splits function's name into 3 nodes: "MYFUNCTION, "_", "NAME" ? How can i fix it?

Solution

The division is always performed based on tokens. Since the ID token cannot contain an _ character, the result is 3 separate tokens that are handled later by the funcId grammar rule. To create a single node for your function name, you'll need to create a lexer rule that can match the input MYFUNCTION_NAME as a single token.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow