How to make a rule that contains several occurences of the same token?

https://stackoverflow.com/questions/23324971

10-07-2023
|

Question

I'm working with ANTLR4 to parse a simple script language.

This language uses the following syntax for FOR loops:

FOR [I] = 1 to [N]
   instructions
NEXT [I]

In order to be correct, a FOR loop must have exactly the same token after the FOR keyword and after the NEXT keyword.

For example, this is correct:

FOR I = 1 TO 10
NEXT I

While this is incorrect:

FOR I = 1 TO 10
NEXT J

So far I have a rule that look like this:

forloop
    : FOR VARNAME EQUAL INT TO INT instructions NEXT VARNAME
    ;

With the following related lexer rules (I removed constant keywords such as FOR : 'FOR';):

fragment ALPHA : [a-zA-Z_];
fragment ALPHANUM : [a-zA-Z_0-9];
fragment DIGIT : [0-9];
VARNAME : ALPHA ALPHANUM*;
INT : DIGIT+;

However, this rule will interpret as correct the second example that is actually incorrect.

How can I tell ANTLR4 that the second VARNAME must be the same that the first one in the rule?

Solution

You could add a predicate at the end of your forloop rule that enforces that both VARNAMEs are equal:

forloop
    : FOR a=VARNAME EQUAL INT TO INT instructions NEXT b=VARNAME
      {$a.getText().equals($b.getText())}?
    ;

But a (IMO) better way would be to let the parser just accept the unequal VARNAMEs and check them after the parse has been completed (in some sort of validation listener that traverses the parse tree).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow