Antlr hidden channel whitespace problem
-
18-09-2019 - |
Question
I have the following Antlr grammar:
grammar MyGrammar;
doc : intro planet;
intro : 'hi';
planet : 'world';
MLCOMMENT
: '/*' ( options {greedy=false;} : . )* '*/' { $channel = HIDDEN; };
WHITESPACE : (
(' ' | '\t' | '\f')+
|
// handle newlines
( '\r\n' // DOS/Windows
| '\r' // Macintosh
| '\n' // Unix
)
)
{ $channel = HIDDEN; };
In the ANTLRWorks 1.2.3 interpreter, the inputs hi world
,hi/**/world
and hi /*A*/ world
work, as expected.
However, the input hiworld
, which shouldn't work, is also accepted.
How do I make hiworld
fail? How do I force at least one whitespace(or comment) between "hi" and "world"?
Note that I've used only MLCOMMENT and WHITESPACE in this example to simplify, but other kinds of comments would be supported.
Solution
You need to create a general ID token. Since the lexer builds the longest token it can, it would see the input "hiworld" as a single word since it's longer than "hi" or "world" by themselves. Such a rule might look like:
ID : ('a'..'z' | 'A'..'Z')+;
As an example, that's exactly how parsers for programming languages separate the "do" keyword from "double" (keyword type, starts with 'do') or "done" (variable name).
OTHER TIPS
One way to make the string hiworld
fail is to use a validating semantic predicate that is guaranteed to fail, as follows:
doc: intro planet;
failure : 'hiworld' { false }?;
intro : 'hi';
planet : 'world';
// rest of grammar omitted