There are a couple of things wrong in your grammar:
- never match tokens that (potentially) match an empty string: your lexer would go in an infinite loop when it tries to match them. In short: remove the
EMPTY
token ' ' | '\t' | '\r' | '\n' | '\u000c' {skip();}
is equivalent to' ' | '\t' | '\r' | '\n' | ('\u000c' {skip();})
. You'd want to do:(' ' | '\t' | '\r' | '\n' | '\u000c') {skip();}
instead- your
SPECIAL
rule matches a single backslash:'\u005C' ( /* NOTHING HERE */ | '"' | ...
: remove the first|
:'\u005C' ( '"' | ...
- a negated character set must contain single characters, not two as you did:
~('\r'? '\n')*
(you can't negate\r\n
). It should be:~('\r' | '\n')*
Try something like this instead (untested!):
grammar myjson;
prog
: object+ EOF
;
object
: '{' (key_value (',' key_value)*)? '}'
;
array
: '[' (value (',' value)*)? ']'
;
key_value
: STRING ':' value
;
value
: object
| array
| STRING
| NUMBER
| BOOL
| NULL
;
NULL
: 'null'
;
BOOL
: 'true'
| 'false'
;
STRING
: '"' (UNICODE | SPECIAL)* '"'
;
NUMBER
: ('+'|'-')? DIGIT+ '.' DIGIT* EXPONENT?
| ('+'|'-')? '.'? DIGIT+ EXPONENT?
;
COMM
: '//' ~('\r' | '\n')* {skip();}
| '/*' .* '*/' {skip();}
;
SPACE
: (' ' | '\t' | '\r' | '\n' | '\u000c')+ {skip();}
;
fragment
DIGIT
: '0'..'9'
;
fragment
EXPONENT
: ('e' | 'E') ('+' | '-') ? DIGIT+
;
fragment
UNICODE
: ~('\u0022' | '\u005C')
;
fragment
SPECIAL
: '\u005C' ( '"' | '\u005C' | '\u002F'
| 'b' | 'f' | 'n' | 'r'
| 't' | 'u' DIGIT DIGIT DIGIT DIGIT
)
;
Also check the JSON grammar from the ANTLR Github repository: https://github.com/antlr/grammars-v4/blob/master/json/Json.g4 Although an ANTLR4 grammar, it looks to be ANTLR 3 compatible.