質問

I have a task to write simple parser-generator, so I wrote ANTLR-like grammar and tried to parse simple file like "foo:bar;", but got the following output:

[@0,0:2='foo',<1>,1:0]
[@1,3:3=':',<16>,1:3]
[@2,4:6='bar',<1>,1:4]
[@3,7:7=';',<18>,1:7]
[@4,8:7='<EOF>',<-1>,1:8]
line 1:0 no viable alternative at input 'foo'
(rule foo : bar ;)

My grammar looks like

grammar parsGen;

gram : rule SEMICOLON (NEWLINE+ rule SEMICOLON)* ;

rule : lRule | pRule ;

lRule : LRULEID COLON lRule1 ;
lRule1 : (((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

pRule : PRULEID COLON pRule1 ;
pRule1 : (((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

modificator : PLUS | ASTERISK | QUESTION ;

ID : LRULEID | PRULEID ;

LRULEID : UPPERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;
PRULEID : LOWERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;

STRING : ('\''.*?'\'') ;
SET : '\''.*?'\'..\''.*?'\'' ;

UPPERLETTER : [A-Z] ;
LOWERLETTER : [a-z] ;
DIGIT : [0-9] ;

NEWLINE : '\r\n'|'\n'|'\r' ;

PLUS : '+' ;
ASTERISK : '*' ;
QUESTION : '?' ;

LBRACE : '(' ;
RBRACE : ')' ;

SPACE : ' ' ;

COLON : ':' ;

PIPE : '|' ;

SEMICOLON : ';' ;

So where could I make a mistake? I tried to search everywhere (google, SO etc.) error "no viable alternative", but it didn't really help me.

役に立ちましたか?

解決

ANTLR lexers fully assign unambiguous token types before the parser is ever used. When multiple token types can match a token, the first one appearing in the grammar is the one that is used. For your grammar, a token cannot have the type ID and the type LRULEID at the same time. Since the input foo matches both of these lexer rules, the first appearing in the grammar is used so your tokens are: ID, COLON, ID, SEMICOLON, <EOF>.

Since the ID token is never actually referenced in the parser, I suggest one of the following changes. Either of these options will resolve the problem you have described, so the choice is entirely your preference for how the final grammar looks.

Foreword

You need to change the space references from SPACE+ to SPACE*, or the rule will require at least one space character between bar and ;.

Option 1

Remove the ID lexer rule altogether.

Option 2

  1. Change ID to a parser rule so it's not trying to assign token type ID to all of your identifiers.

    id : LRULEID | PRULEID;
    
  2. Update pRule1 rule by referencing id.

    pRule1 : ((id | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
    

Unrelated Side Note

You grammar might be easier to read if you remove the outermost + closure inside the lRule and pRule1 rules, and instead add them to the rule references themselves, like this. Note that I changed the SPACE references as described in the foreword.

lRule : LRULEID COLON lRule1+ ;
lRule1 : ((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;

pRule : PRULEID COLON pRule1+ ;
pRule1 : ((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;

他のヒント

Also from the http://www.antlr.org/api/Java/org/antlr/v4/runtime/NoViableAltException.html:

Indicates that the parser could not decide which of two or more paths to take based upon the remaining input. It tracks the starting token of the offending input and also knows where the parser was in the various paths when the error [occured].

In my case I was calling lexer.nextToken() before parsing for debugging purposes. That in turn without lexer.reset() was causing no viable alternative at input EOF error.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top