Distinguish Optional Tokens While Visiting an Antlr Rule
-
21-12-2019 - |
Question
This question is about how to distinguish optional tokens while visiting an antlr rule.
I have a parser rule I have defined in an antlr4 grammar called 'assign' which attempts to assign the result of an expression to a tag represented by an INT, for example 215="FOO". It also allows assignment to a tag index, for example 215[2] = "FOO". My question is how can I distinguish INTs of the form 215 vs. the form 215[2] by looking at the objects provided by antlr during the evaluation of the assign rule?
assign : INT '=' expr ;
INT : '-'? DIGIT+ ('[' DIGIT+ ']')?;
DIGIT : [0-9] ;
I have defined a visitor method to capture the parser's evaluation of the "assign" rule for a token stream:
215[2]="FOO"
@Override
public String visitAssign(@NotNull FixRulesParser.AssignContext ctx) {
String left = ctx.getStart().getText();
String right = ctx.getStop().getText();
...
At this point left = "215[2]" and right = "FOO"
Does the ctx object offer a way to determine if the left side of the assignment (215[2]) actually contains the optional '[2]' defined by INT? I want to distinguish INTs of the form 215[2] vs. 215. I use a Java regex to parse 'left' (see below) to make the determination but I'm wondering if I can get the answer directly from antlr.
Pattern p = Pattern.compile("(-?\\d+)((\\[)(\\d+)(\\]))?");
Solution
One solution I found after reading "The Antlr Reference - Chapter 12" was to define the Index "[2]" as a lexer token of it's own and place it on a separate channel.
assign : INT '=' expr ;
INT : '-'? DIGIT+ ;
IDX : '[' DIGIT+ ']' -> channel(TAG_INDEX);
DIGIT : [0-9] ;
Then I can make the determination in visitAssign():
@Override
public String visitAssign(@NotNull FixRulesParser.AssignContext ctx) {
BufferedTokenStream tokens = tokenStream; // passed in to the constructor as arg
Token t = tokens.get(1);
int type = t.getType();
if (type == FixRulesParser.IDX) {
System.out.println("YES");
} else {
System.out.println("NO");
}
I think the lesson here is that IDX shouldn't be combined with any other tokens if you need to refer to it individually.
One thing that still confuses me is if I remove the TAG_INDEX channel:
IDX : '[' DIGIT+ ']';
I get: line 1:4 no viable alternative at input '215[2]'
Would be good to know why but at least I have a solution.