Question

I have the following Antlr grammar:

grammar MyGrammar;

doc :   intro planet;
intro   :   'hi';
planet  :   'world';
MLCOMMENT 
    :   '/*' ( options {greedy=false;} : . )* '*/' { $channel = HIDDEN; };
WHITESPACE : ( 
    (' ' | '\t' | '\f')+
  |
    // handle newlines
    ( '\r\n'  // DOS/Windows
      | '\r'    // Macintosh
      | '\n'    // Unix
    )
    )
 { $channel = HIDDEN; };

In the ANTLRWorks 1.2.3 interpreter, the inputs hi world,hi/**/world and hi /*A*/ world work, as expected.

However, the input hiworld, which shouldn't work, is also accepted. How do I make hiworld fail? How do I force at least one whitespace(or comment) between "hi" and "world"?

Note that I've used only MLCOMMENT and WHITESPACE in this example to simplify, but other kinds of comments would be supported.

Was it helpful?

Solution

You need to create a general ID token. Since the lexer builds the longest token it can, it would see the input "hiworld" as a single word since it's longer than "hi" or "world" by themselves. Such a rule might look like:

ID : ('a'..'z' | 'A'..'Z')+;

As an example, that's exactly how parsers for programming languages separate the "do" keyword from "double" (keyword type, starts with 'do') or "done" (variable name).

OTHER TIPS

One way to make the string hiworld fail is to use a validating semantic predicate that is guaranteed to fail, as follows:

doc:      intro planet;
failure : 'hiworld' { false }?;
intro   : 'hi';
planet  : 'world';
// rest of grammar omitted
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top