ANTLR4 Key/Value grammar

https://stackoverflow.com/questions/14048277

antlr
antlr4

12-12-2021
|

Question

I have a very simple key/value grammar (not the actual grammar that I'm working on but this is the simplest I can come up with that shows my issue) that appears to have problems with the lexer matching order in ANTLR 4.0b4. The grammar is:

grammar test;

r     : HELLO COLON VALUE;
HELLO : 'hello';
COLON : ':';
VALUE : .+;

Given this grammar and the input 'hello:world' I would expect it to parse correctly, however it appears that the entire input is pulled in to a single VALUE token and hence fails:

hello:world
[@0,0:11='hello:world\n',<3>,1:0]
[@1,12:11='<EOF>',<-1>,2:12]
line 1:0 mismatched input 'hello:world\n' expecting 'hello'

What am I doing wrong?

Solution

The grammar compiler should be issuing a warning about the use of a greedy .+ in the lexer.

The VALUE rule literally says "consume as many characters as you can, with no consideration of what the characters are". If your input is not exactly hello or :, then your lexer will consume a single VALUE token containing the entire input.

Perhaps the following lexer is closer to what you're after:

lexer grammar textLexer;

HELLO : 'hello';
COLON : ':' -> pushMode(ValueMode);

mode ValueMode;

    VALUE : ~[\r\n]+ -> popMode;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow