ANTLR4 Semantic Predicates that is Context Dependent Does Not Work

https://stackoverflow.com/questions/23580735

antlr4

19-07-2023
|

Question

I am parsing a C++ like declaration with this scaled down grammar (many details removed to make it a fully working example). It fails to work mysteriously (at least to me). Is it related to the use of context dependent predicate? If yes, what is the proper way to implement the "counting the number of child nodes logic"?

grammar CPPProcessor;

cppCompilationUnit : decl_specifier_seq? init_declarator* ';'  EOF;

init_declarator:     declarator initializer?;
declarator:  identifier;
initializer: '=0';

decl_specifier_seq
  locals [int cnt=0]
    @init {  $cnt=0;    }
: decl_specifier+ ;
decl_specifier :   @init {    System.out.println($decl_specifier_seq::cnt);  }
    'const'
  | {$decl_specifier_seq::cnt < 1}? type_specifier {$decl_specifier_seq::cnt += 1;}  ;
type_specifier:  identifier ; 
identifier:IDENTIFIER;
CRLF: '\r'? '\n' -> channel(2);
WS: [ \t\f]+    -> channel(1);
IDENTIFIER:[_a-zA-Z] [0-9_a-zA-Z]* ;

I need to implement the standard C++ rule that no more than 1 type_specifier is allowed under an decl_specifier_seq.

Semantic predicate before type_specifier seems to be the solution. And the count is naturally declared as a local variable in decl_specifier_seq since nested decl_specifier_seq are possible.

But it seems that a context dependent semantic predicate like the one I used will produce incorrect parsing i.e. a semantic predicate that references $attributes. First an input file with correct result (to illustrate what a normal parse tree looks like):

int t=0;

and the parse tree:

enter image description here

But, an input without the '=0' to aid the parsing

int t;

enter image description here

0
1
line 1:4 no viable alternative at input 't'
1

the parsing failed with the 'no viable alternative' error (the numbers printed in the console is debug print of the $decl_specifier_cnt::cnt value as a verification of the test condition). i.e. the semantic predicate cannot prevent the t from being parsed as type_specifier and t is no longer considered a init_declarator. What is the problem here? Is it because a context dependent predicate having $decl_specifier_seq::cnt is used?

Does it mean context dependent predicate cannot be used to implement "counting the number of child nodes" logic?

EDIT

I tried new versions whose predicate uses member variable instead of the $decl_specifier_seq::cnt and surprisingly the grammar now works proving that the Context Dependent predicate did cause the previous grammar to fail:

....
@parser::members {
  public int cnt=0;
}
decl_specifier
  @init {System.out.println("cnt:"+cnt);  }
: 
    'const'
  | {cnt<1 }? type_specifier {cnt++;}  ;

A normal parse tree is resulted:

enter image description here

This gives rise to the question of how to support nested rule if we must use member variables to replace the local variables to avoid context sensitive predicates?

And a weird result is that if I add a /*$ctx*/ after the predicate, it fails again:

decl_specifier
  @init {System.out.println("cnt:"+cnt);  }
: 
    'const'
  | {cnt<1 /*$ctx*/ }? type_specifier {cnt++;}  ;

enter image description here

line 1:4 no viable alternative at input 't'

The parsing failed with no viable alternative. Why the /*$ctx*/ causes the parsing to fail like when $decl_specifier_seq::cnt is used although the actual logic uses a member variable only? And, without the /*$ctx*/, another issue related to the predicate called before @init block appears(described here)

Solution

ANTLR 4 evaluates semantic predicates in two cases.

The generated code evaluates a semantic predicate during parsing, and throws an exception of the evaluation returns false. All predicates traversed during parsing are evaluated in this way, including context-dependent predicates and predicates which do not appear at the left side of a decision.
The prediction method evaluates predicates in order to make correct decisions during parsing. In this case, predicates which appear anywhere other than the left edge of the decision being evaluated are assumed to return true (i.e. they are ignored). In addition, context-dependent predicates are only evaluated if the context data is available. The prediction algorithm will not create context structures that were not already provided by the parsing code. If a context-dependent predicate is encountered during prediction and no context is available, the predicate is assumed to return true (i.e. it is ignored for that decision).

The code generator does not evaluate the semantics of the target language, so it has no way to know that $ctx is semantically irrelevant when it appears in /*$ctx*/. Both cases result in the predicate being treated as context-dependent.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow