Question

This question is about how to distinguish optional tokens while visiting an antlr rule.

I have a parser rule I have defined in an antlr4 grammar called 'assign' which attempts to assign the result of an expression to a tag represented by an INT, for example 215="FOO". It also allows assignment to a tag index, for example 215[2] = "FOO". My question is how can I distinguish INTs of the form 215 vs. the form 215[2] by looking at the objects provided by antlr during the evaluation of the assign rule?

assign  : INT '=' expr  ;
INT     : '-'? DIGIT+ ('[' DIGIT+ ']')?;
DIGIT   :  [0-9] ;

I have defined a visitor method to capture the parser's evaluation of the "assign" rule for a token stream:

215[2]="FOO"

@Override
public String visitAssign(@NotNull FixRulesParser.AssignContext ctx) {
String left = ctx.getStart().getText();
String right = ctx.getStop().getText();
...

At this point left = "215[2]" and right = "FOO"

Does the ctx object offer a way to determine if the left side of the assignment (215[2]) actually contains the optional '[2]' defined by INT? I want to distinguish INTs of the form 215[2] vs. 215. I use a Java regex to parse 'left' (see below) to make the determination but I'm wondering if I can get the answer directly from antlr.

Pattern p = Pattern.compile("(-?\\d+)((\\[)(\\d+)(\\]))?");
Was it helpful?

Solution

One solution I found after reading "The Antlr Reference - Chapter 12" was to define the Index "[2]" as a lexer token of it's own and place it on a separate channel.

assign  : INT '=' expr  ;
INT : '-'? DIGIT+ ;
IDX : '[' DIGIT+ ']' -> channel(TAG_INDEX);
DIGIT   : [0-9] ;

Then I can make the determination in visitAssign():

@Override
public String visitAssign(@NotNull FixRulesParser.AssignContext ctx) {
    BufferedTokenStream tokens = tokenStream; // passed in to the constructor as arg 
    Token t = tokens.get(1);
    int type = t.getType();
    if (type == FixRulesParser.IDX) {
        System.out.println("YES");
    } else {
        System.out.println("NO");
    }

I think the lesson here is that IDX shouldn't be combined with any other tokens if you need to refer to it individually.

One thing that still confuses me is if I remove the TAG_INDEX channel:

IDX : '[' DIGIT+ ']';

I get: line 1:4 no viable alternative at input '215[2]'

Would be good to know why but at least I have a solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top