Question

As a beginner, when I was learning ANTLR4 from the The Definitive ANTLR 4 Reference book, I tried to run my modified version of the exercise from Chapter 7:

/**
 * to parse properties file
 * this example demonstrates using embedded actions in code
 */
grammar PropFile;

@header  {
    import java.util.Properties;
}
@members {
    Properties props = new Properties();
}
file
    : 
    {
        System.out.println("Loading file...");
    }
        prop+
    {
        System.out.println("finished:\n"+props);
    }
    ;

prop
    : ID '=' STRING NEWLINE 
    {
        props.setProperty($ID.getText(),$STRING.getText());//add one property
    }
    ;

ID  : [a-zA-Z]+ ;
STRING  :(~[\r\n])+; //if use  STRING : '"' .*? '"'  everything is fine
NEWLINE :   '\r'?'\n' ;

Since Java properties are just key-value pair I use STRING to match eveything except NEWLINE (I don't want it to just support strings in the double-quotes). When running following sentence, I got:

D:\Antlr\Ex\PropFile\Prop1>grun PropFile prop -tokens
driver=mysql
^Z
[@0,0:11='driver=mysql',<3>,1:0]
[@1,12:13='\r\n',<4>,1:12]
[@2,14:13='<EOF>',<-1>,2:14]
line 1:0 mismatched input 'driver=mysql' expecting ID

When I use STRING : '"' .*? '"' instead, it works.

I would like to know where I was wrong so that I can avoid similar mistakes in the future.

Please give me some suggestion, thank you!

Was it helpful?

Solution

Since both ID and STRING can match the input text starting with "driver", the lexer will choose the longest possible match, even though the ID rule comes first.

So, you have several choices here. The most direct is to remove the ambiguity between ID and STRING (which is how your alternative works) by requiring the string to start with the equals sign.

file : prop+ EOF ;
prop : ID STRING NEWLINE ;

ID      : [a-zA-Z]+ ;
STRING  : '=' (~[\r\n])+;
NEWLINE : '\r'?'\n' ;

You can then use an action to trim the equals sign from the text of the string token.

Alternately, you can use a predicate to disambiguate the rules.

file : prop+ EOF ;
prop : ID '=' STRING NEWLINE ;

ID      : [a-zA-Z]+ ;
STRING  : { isValue() }? (~[\r\n])+; 
NEWLINE : '\r'?'\n' ;

where the isValue method looks backwards on the character stream to verify that it follows an equals sign. Something like:

@members {
public boolean isValue() {
    int offset = _tokenStartCharIndex;
    for (int idx = offset-1; idx >=0; idx--) {
        String s = _input.getText(Interval.of(idx, idx));
        if (Character.isWhitespace(s.charAt(0))) {
            continue;
        } else if (s.charAt(0) == '=') {
            return true;
        } else {
            break;
        }
    }
    return false;
}
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top