Question

Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...

The aim is still the same : only extracting useful information with the following grammar,

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';  
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

alt text

This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException or a noViableException

    it's 10PM and the Lazy CAT is currently SLEEPING heavily, 
    with a DOGGY bag, on the SOFA in front of the TV.

produce an error :

alt text

DOGGY is interpreted as the beginning for DOG which is also a part of the TOKEN SUBJECT and the lexer is lost... How could I avoid this without defining DOGGY as a special token... I would have like the parser to understand DOGGY as a word in itself.

Was it helpful?

Solution

Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();}; solves my problem !

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top