Pregunta

In order to have the lexer of ANTLR4 recognize different kinds of tokens in one rule I use a semantic predicate. This predicate evaluates a static field of a helper class. Have a look at some grammar excerpts:

// very simplified

@header {
  import static ParserAndLexerState.*;
}
@members {
  private boolean fooAllowed() {
    System.out.println(fooAllowed);
  }
...
methodField 
: t = type 
  { fooAllowed = false; } 
    id = Identifier 
  { fooAllowed = true; /* do something with t and id*/  }
...

fragment CHAR_NO_OUT_1 : [a-eg-zA-Z_] ;
fragment CHAR_NO_OUT_2 : [a-nq-zA-Z_0-9] ;
fragment CHAR_NO_OUT_3 : [a-nq-zA-Z_0-9] ;
fragment CHAR_1 : [a-zA-Z_] ;
fragment CHAR_N : CHAR_1 | [0-9] ;

Identifier
// returns every possible identifier
: { fooAllowed() }? (CHAR_1 CHAR_N*)
// returns everything but 'foo'
| { !fooAllowed() }? CHAR_NO_OUT_1 (CHAR_NO_OUT_2 (CHAR_NO_OUT_3 CHAR_N*)?)? ;

Identifier will now always behave as if fooAllowed had the initial value of the definition in ParserAndLexerState. So if this was true Identifier will only use the first alternative of the rule, otherwise always the second. This is some weird behavior, especially considering that fooAllowed prints the right values to the console.

Is there anything in ANTLR4 that could discourages me from using global state from within semantic predicates? How can I avoid this behavior?

¿Fue útil?

Solución

ANTLR 4 uses unbounded lookahead with non-deterministic termination conditions for the prediction process. While the TokenStream implementations do call TokenSource.nextToken lazily, it is not safe to ever assume that the number of tokens consumed so far is bounded.

In other words, the actual semantics of using a parser action to change the behavior of the lexer are undefined. Different versions of ANTLR 4, or even subtle changes in the input you give it, could produce completely different results.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top