Token recognition error: antlr

Question 1

fragment lexer rules can only be used by other lexer rules: these will never become a token on their own. Therefor, you cannot use fragment rules in parser rules.

Question 2

The fragment is not the root cause.

First, try to reproduce your errors:

When compiling your Test.g4, it will appear warnings below:

warning(156): Test.g4:11:21: invalid escape sequence \"
warning(156): Test.g4:123:59: invalid escape sequence \"
warning(146): Test.g4:11:0: non-fragment lexer rule QUOTE can match the empty string
warning(125): Test.g4:3:8: implicit definition of token NonZeroDigit in parser
warning(125): Test.g4:3:25: implicit definition of token Digit in parser

After removing unused rules:

grammar Test;

start : NonZeroDigit '.' Digit Digit? EOF
      ;

fragment
NonZeroDigit : [1-9]
             ;

fragment
Digit : '0' | NonZeroDigit
      ;

Then compile it again and test it:

warning(125): Test.g4:3:8: implicit definition of token NonZeroDigit in parser
warning(125): Test.g4:3:25: implicit definition of token Digit in parser


line 1:0 token recognition error at: '1'
line 1:2 token recognition error at: '1'
line 1:3 token recognition error at: '1'
line 1:1 missing NonZeroDigit at '.'
line 1:4 missing Digit at '<EOF>'
(start <missing NonZeroDigit> . <missing Digit> <EOF>)

(try to reproduce your errors)

When applying 'fragment'

When applying 'fragment' on NonZeroDigit and Digit, the g4 will be equivalent to :

replace NonZeroDigit with [1-9]

grammar Test;

start : [1-9] '.' Digit Digit? EOF
      ;

fragment
Digit : '0' | [1-9]
      ;

replace Digit with ('0' | [1-9])

grammar Test;

start : [1-9] '.' ('0' | [1-9]) ('0' | [1-9])? EOF
      ;

but the parser rule start(the identifier starts with a lowercase alphabet) cannot be all letters.

Refer to The Definitive ANTLR 4 Reference Page73

lexer rule names with uppercase letters and parser rule names with lowercase letters. For example, ID is a lexical rule name, and expr is a parser rule name.

After removing 'fragment'

After removing 'fragment' from g4, there is still an unexpected error.

line 1:3 extraneous input '3' expecting {<EOF>, Digit}
(start 1 . 0 3 <EOF>)

Error study:
for NonZeroDigit:
if naming as nonZeroDigit, we will get:

syntax error: '1-9' came as a complete surprise to me while matching alternative

Because [1-9] is a letter (constant token). We need to name it with an uppercase prefix. (=lexer rule)

for Digit:
it containing an identifier NonZeroDigit, so we need to name it with a lowercase prefix. (=parser rule)

The correct Test.g4 should be:

grammar Test;

start : NonZeroDigit '.' digit digit? EOF
      ;

NonZeroDigit : [1-9]
             ;

digit : '0' | NonZeroDigit
      ;

If you want to use fragment, you should create a lexer rule Number because the rule ONLY consists of letters (constant tokens). And the identifier should start with an uppercase prefix, start is not

grammar Test;

start : Number EOF
      ;

Number : NonZeroDigit '.' Digit Digit?
       ;

fragment
NonZeroDigit : [1-9]
             ;

fragment
Digit : '0' | NonZeroDigit
      ;