Antlr grammar unpredicted behavior

https://stackoverflow.com/questions/17506071

regex
antlr3

02-06-2022
|

Question

I've begun experimenting with ANTLR3 today. There seems to be a discrepency in the expressions that I use.

I want my class name to start with a capital letter, followed by mixed case letters and numbers. For instance, Car is valid, 8Car is invalid.

CLASS_NAME : ('A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')*;

This works fine when I test it individually. However when I use it in the following rule,

model
    : '~model' CLASS_NAME model_block
    ;

However, the CLASS_NAME begins to pick up class names beginning with numbers as well. In this case, ANTLR picks up Car, 8Car or even #Car as valid tokens. I'm missing something silly. Any pointers would be appreciated. Thanks.

Solution

CLASS_NAME will not match 8Car or #Car. You're probably using ANTLRWorks' interpreter (or the Eclipse plugin, which uses the same interpreter), which is printing errors on a UI tab you're not aware of, and displaying the incorrect chars in the tokens. Use ANTLRWorks' debugger instead, or write a small test class yourself:

T.g

grammar T;

parse : CLASS_NAME EOF;

CLASS_NAME : ('A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9')*;

Main.java

import org.antlr.runtime.*;

public class Main {

  public static void main(String[] args) throws Exception {

    TLexer lexer = new TLexer(new ANTLRStringStream("8Car"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();  
  }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow