antlr match different bracketed parts

https://stackoverflow.com/questions/17515260

antlr
antlr3

02-06-2022
|

Question

I have the following language to define, and apparently ANTLR is not giving away its secrets to easy.

ui { 
  screen X {
    input()
    checkbox()
  }
}
model {
 // any text
 // even {}

}

I would define

ui: UI OBR (screen)* CBR;

screen: ....

model : MODEL modelBody;

modelBody: BRACKETED_TEXT;

OBR: '{';
CBR: '}';
...
TEXT : ('a'..'z'|'A'..'Z'| '_' | '-' )+ ;
BRACKETED_TEXT : OBR ( ~(OBR|CBR ) | BRACKETED_TEXT )*  CBR;

The problem is that it throws a MismatchedTokenException when hitting the ui { part. If I remove the BRACKETED_TEXT token all goes well, so I'm figuring it must be the fact that it cannot know whether it can match an OBR or a BRACKETED_TEXT when parsing ui {.

This is fine but how can I have structured AST for ui {...} and free text for the model {..} ?

Solution

OK, the answer is like this :

ui: UI OBR (screen)* CBR;

screen: ....

model : MODEL modelBody;

modelBody: genericBlock; // no more BRACKETED_TEXT

genericBlock
 : OBR
  ( TEXT
  | QUOTED_TEXT
  | OPAR
  | CPAR
  | genericBlock
)*
CBR
 ;

OBR: '{';
CBR: '}';
OPAR: '(';// this was part of the initial grammar file, but left out
CPAR: ')';// same here
...
TEXT : ('a'..'z'|'A'..'Z'| '_' | '-' )+ ;

Question still remains, and I hope someone can clear it. Why do I have to declare all the tokens which will not be present in the generic block ? If lexer encounters a declared TOKEN inside the genericBlock, he fails. Why is that I specifically told it to match anything else other than OBR and CBR.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow