Question

I'm trying to implement a grammar for parsing lucene queries. So far everything went smooth until i tried to add support for range queries . Lucene details aside my grammar looks like this :

grammar ModifiedParser;

TERM_RANGE : '[' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) ']'
           | '{' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) '}'
           ;

query : not (booleanOperator? not)* ;

booleanOperator : andClause
                | orClause
                ;

andClause : 'AND' ;
notClause : 'NOT' ;
orClause  : 'OR' ;

not : notClause? MODIFIER? clause;

clause : unqualified                        
       | qualified                          
       ;

unqualified : TERM_RANGE                   # termRange
            | TERM_PHRASE                  # termPhrase
            | TERM_PHRASE_ANYTHING         # termTruncatedPhrase
            | '(' query ')'                # queryUnqualified
            | TERM_TEXT_TRUNCATED          # termTruncatedText
            | TERM_NORMAL                  # termText
            ;

qualified : TERM_NORMAL ':' unqualified                  
          ;

fragment TERM_CHAR  : (~(' ' | '\t' | '\n' | '\r' | '\u3000'
                    | '\'' | '\"' | '(' | ')' | '[' | ']' | '{' | '}'
                    | '+' | '-' | '!' | ':' | '~' | '^'
                    | '?' | '*' | '\\' ))
                    ;

fragment TERM_START_CHAR : TERM_CHAR
                         | ESCAPE
                         ;

fragment ESCAPE : '\\' ~[];

MODIFIER : '-'
         | '+'
         ;

AND : 'AND';
OR : 'OR';
NOT : 'NOT';

TERM_PHRASE_ANYTHING : '"'  (ESCAPE|~('\"'|'\\'))+  '"' ;
TERM_PHRASE          : '"' (ESCAPE|~('\"'|'\\'|'?'|'*'))+ '"' ;
TERM_TEXT_TRUNCATED : ('*'|'?')(TERM_CHAR+ ('*'|'?'))+ TERM_CHAR*
                    | TERM_START_CHAR (TERM_CHAR* ('?'|'*'))+ TERM_CHAR+
                    | ('?'|'*') TERM_CHAR+
                    ;  

TERM_NORMAL : TERM_TEXT;                                            

fragment TERM_TEXT : TERM_START_CHAR TERM_CHAR* ;

WS : [ \t\r\n] -> skip ;

When i try to do a visitor and work with the tokens apparently parsing asd [ 10 TO 100 ] { 1 TO 1000 } 100..1000 will throw token recognition error for [ , ] , } and {, and only tries to visit the termRange rule on the third range . do you guys know what i'm missing here ? Thanks in advance

Was it helpful?

Solution

Since you made TERM_RANGE a lexer rule, you must account for everything at a character level. In particular, you forgot to allow whitespace characters in your input.

You would likely be in a much better position if you instead created termRange, a parser rule.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top