Question

I know this question has been asked before, but I haven't found any solution to my specific problem. I am using Antlr4 with the C# target and I have the following lexer rules:

INT     : [0-9]+
        ;

LETTER  : [a-zA-Z_]+
        ;

WS      : [ \t\r\n\u000C]+ -> skip
        ;

LineComment
        : '#' ~[\r\n]* -> skip
        ;

That are all lexer rules, but there are many parser rules which I will not post here since I don't think it is relevant. The problem I have is that whitespaces do not get skipped. When I inspect the token stream after the lexer ran my input, the whitespaces are still in there and therefore cause an error. The input I use is relatively basic:

"fd 100"

it parses complete until it reaches this parser rule:

noSignFactor
        : ':' ident                 #NoSignFactorArg
        | integer                   #NoSignFactorInt
        | float                     #NoSignFactorFloat
        | BOOLEAN                   #NoSignFactorBool
        | '(' expr ')'              #NoSignFactorExpr
        | 'not' factor              #NoSignFactorNot
        ;
integer : INT                       #IntegerInt
        ;
Was it helpful?

Solution

Start by separating your grammar into a separate lexer grammar and parser grammar. For example, if you have a grammar Foo;, create the following:

  1. Create a file FooLexer.g4, and move all of the lexer rules from Foo.g4 into FooLexer.g4.

  2. Create a file FooParser.g4, and move all of the parser rules from Foo.g4 into FooParser.g4.

  3. Include the following option in FooParser.g4:

    options {
      tokenVocab=FooLexer;
    }
    

This separation will ensure that your parser isn't silently creating lexer rules for you. In a combined grammar, using a literal such as 'not' in a parser rule will create a lexer rule for you if one does not already exist. When this happens, it's easy to lose track of what kinds of tokens your lexer is able to produce. When you use a separate lexer grammar, you will need to explicitly declare a rule like the following in order to use 'not' in a parser rule.

NOT : 'not';

This should solve the problems with whitespace should you have included the literal ' ' somewhere in a parser rule.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top