ANTLR fuzzy parsing

https://stackoverflow.com/questions/18591597

antlr
antlr3

27-06-2022
|

Question

I'm building a kind of pre-processor in ANTLRv3, which of course only works with fuzzy parsing. At the moment I'm trying to parse include statements and replace them with the corresponding file content. I used this example:

ANTLR: removing clutter

Based on this example, I wrote the following code:

grammar preprocessor;

options {
    language='Java';
}

@lexer::header {

package antlr_try_1;

}

@parser::header {

package antlr_try_1;

}

parse
 : (t=. {System.out.print($t.text);})* EOF
 ;

INCLUDE_STAT
 : 'include' (' ' | '\r' | '\t' | '\n')+ ('A'..'Z' | 'a'..'z' | '_' | '-' | '.')+
   {
     setText("Include statement found!");
   }
 ;

Any
 : . // fall through rule, matches any character
 ;

This grammar does only for printing the text and replacing the include statements with the "Include statement found!" string. The example text to be parsed looks like this:

some random input
some random input
some random input

include some_file.txt

some random input
some random input
some random input

The output of the result looks in the following way:

C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 1:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 2:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 3:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 7:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 8:14 mismatched character 'p' expecting 'c'
C:\Users\andriyn\Documents\SandBox\text_files\asd.txt line 9:14 mismatched character 'p' expecting 'c'
some random ut
some random ut
some random ut

Include statement found!

some random ut
some random ut
some random ut

As far as I can judge, it is confused by the "in" in the word "input", because it "thinks" it would be the INCLUDE_STAT token.

Is there a better way to do it? The filter option I cannot use, since I need not only the include statements, but also the rest of the code. I've tried several other things, but couldn't find a proper solution.

Solution

You are observing one of ANTLR 3's limitations. You could use either of these options to correct the immediate problem:

Upgrade to ANTLR 4, which does not have this limitation.

Include the following syntactic predicate at the beginning of the INCLUDE_STAT rule:

`('include' (' ' | '\r' | '\t' | '\n')+ ('A'..'Z' | 'a'..'z' | '_' | '-' | '.')+) =>`

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow