Question

I am running into this again and again. To parse {1}SB0$1:U insied this input S:G$mabit$0$0({1}SB0$1:U),H,0,0 I have these rules here:

/*
 *  Type Chain Record
 */

type_chain_record
    :
    '{' number[10] '}' type_dcl_id (',' type_dcl_id)? ':' type_sign
    ;

type_dcl_id
    :
      'DA' EXPRESSION 'd'                 // Array of n elements
    | 'DF'                                      // Function
    | 'DG'                                      // Generic pointer
    | 'DC'                                      // Code pointer
    | 'DX'                                      // External ram pointer
    | 'DD'                                      // Internal ram pointer
    | 'DP'                                      // Page pointer
    | 'DI'                                      // Upper 128 byte pointer
    | 'SL'                                      // long
    | 'SI'                                      // int
    | 'SC'                                      // char
    | 'SS'                                      // short
    | 'SV'                                      // void
    | 'SF'                                      // float
    | 'ST' EXPRESSION                   // Structure of name <name>
    | 'SX'                                      // sbit
    | 'SB' EXPRESSION '$' EXPRESSION    // Bit field of n bits
    ;

type_sign
    :
      'U' // Unsigned
    | 'S' // Signed
    ;


number[int numbase] returns[long val]
    :
    n = EXPRESSION
    {
        $val = Convert.ToInt64($n.text, $numbase);
    }
    ;


// ////////////////////////////////////////////////////////////////////////////
// LEXER RULES

fragment LETTER 
    :
      'a'..'z' 
    | 'A'..'Z'
    ;

fragment DIGIT
    :
    '0'..'9'
    ;

fragment NONZERO_DIGIT
    :
    '1'..'9'
    ;


FILE_SCOPE
  :
  'L' (LETTER)+ '.' (LETTER)+
  ;

EXPRESSION
  :
  (LETTER | DIGIT | '_' )+
  ;

WS 
  :
  '\r' | '\n'
  ;

I don't understand why but I am getting a NoViableAltException saying line x:y no viable alternative at input 'SB0'.

Could anyone explain me why this is happening? The parser rule type_dcl_id has unique literals in front of every choice. I don't see why the parser would have troubles at this point.

I added all lexer rules.

Side note:

The reason why I want that granularity and not simple parse over that input is that I want type_dcl_id later to return an object which shall be propagated up to type_chain_record and later be used to construct another object ChainRecord which will hold an object DCLType.

Was it helpful?

Solution

SB0 gets tokenized as an EXPRESSION, because the lexer will match longest possible sequence and obviously SB0 is longer than SB.

An easy workaround would be to make LETTER and DIGIT real lexer rules instead of fragments and exchange the EXPRESSION lexer rule by the following new parser rule:

expression : (LETTER | DIGIT | '_' )+ ;

For more information you might find this post helpful: https://github.com/antlr/antlr4/issues/485#issuecomment-37284837

OTHER TIPS

| 'SB' EXPRESSION '$' EXPRESSION // Bit field of n bits

does not match SBO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top