Error - Action in lexer rule 'xxxx' must be last element of single outermost alt

https://stackoverflow.com/questions/18456455

26-06-2022
|

Question

When I upgraded from Antlr 3 to Antlr 4, I removed all the syntactic predicates in the grammar. But when I changed it, I am getting an error as mentioned in the title.

This is the changed code

NUMBER 
    :(
      '0'..'9' ('.' '0'..'9'+)?
    | '.' '0'..'9'+
    )
    (
        E
        (
              M     { $type = EMS;          }
            | X     { $type = EXS;          }
        )
    |   P
        (
              X     
            | T
            | C
        )
                    { $type = LENGTH;       }   
    |   C M         { $type = LENGTH;       }
    |   M
        (
              M     { $type = LENGTH;       }

            | S     { $type = TIME;         }
        )
    |   I N         { $type = LENGTH;       }

    |   D E G       { $type = ANGLE;        }
    |   R A D       { $type = ANGLE;        }

    |   S           { $type = TIME;         }

    |   K? H    Z   { $type = FREQ;         }

    | IDENT         { $type = DIMENSION;    }

    | '%'           { $type = PERCENTAGE;   }

    | // Just a number
    )
;

This is the error I am getting. enter image description here

I saw an answer to this question here. But I was unable to grasp what it meant. Please give me some guidance.

EDIT:

Same error appears here in the grammar.

    fragment    INVALID :;
     STRING          : '\'' ( ~('\n'|'\r'|'\f'|'\'') )* 
                    (

                            '\''
                            | { $type = INVALID;   }
                    )

                | '"' ( ~('\n'|'\r'|'\f'|'"') )*
                    (
                         '"'
                        | { $type = INVALID;   }
                    )
                ;

I was unable to change this to ANTLR 4. What is new in this code? Please give me a quick fix for this.

Solution

Your NUMBER rule has been heavily manually left-factored. In reality, it is a single lexer rule that produces 9 different types of token. The left-factoring was likely performed due to the way ANTLR 3 lexers use prediction with a recursive descent parser. ANTLR 4 uses a completely different lexer algorithm based on DFAs. The error you see is a result of this change - since ANTLR 4 lexers are no longer recursive-descent parsers, they no longer have the ability to execute action code at arbitrary points.

The most effective way to write the above rule in ANTLR 4 is to use the "inefficient" syntax from ANTLR 3. In ANTLR 4, it will not be slow.

EMS
    : NUMBER E M
    ;
EXS
    : NUMBER E X
    ;
LENGTH
    : NUMBER P X
    | NUMBER P T
    | NUMBER P C
    | NUMBER C M
    | NUMBER M M
    | NUMBER I N
    ;
TIME
    : NUMBER M S
    | NUMBER S
    ;
ANGLE
    : NUMBER D E G
    | NUMBER R A D
    ;
FREQ
    : NUMBER K? H Z
    ;
DIMENSION
    : NUMBER IDENT
    ;
PERCENTAGE
    : NUMBER '%'
    ;
NUMBER
    : [0-9] ('.' [0-9]+)?
    | '.' [0-9]+
    ;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow