Question

I've defined a flavor of SQL that we use in my company as the following:

/** Grammars always start with a grammar header. This grammar is called
 *  GigyaSQL and must match the filename: GigyaSQL.g4
 */
grammar GigyaSQL;

parse
 : selectClause
   fromClause
   ( whereClause )?
   ( filterClause )?
   ( groupByClause )?
   ( limitClause )?
 ;

selectClause
 : K_SELECT result_column ( ',' result_column )*
 ;

result_column
 : '*' # selectAll
 | table_name '.' '*' # selectAllFromTable
 | select_expr ( K_AS? column_alias )? # selectExpr
 | with_table # selectWithTable
 ;

fromClause
 : K_FROM table_name
 ;

table_name 
 : any_name # simpleTable
 | any_name K_WITH with_table # tableWithTable
 ;

any_name
 : IDENTIFIER
 | STRING_LITERAL
 | '(' any_name ')'
 ;

with_table
 : COUNTERS
 ;

select_expr
 : literal_value
 | range_function_in_select
 | interval_function_in_select
 | ( table_name '.' )? column_name
 | function_name '(' argument_list ')'
 ;

whereClause
 : K_WHERE condition_expr
 ;

condition_expr
 : literal_value # literal
 | ( table_name '.' )? column_name # column_name_expr
 | unary_operator condition_expr # unary_expr
 | condition_expr binary_operator condition_expr # binary_expr
 | K_IFELEMENT '(' with_table ',' condition_expr ')' # if_element
 | function_name '(' argument_list ')' # function_expr
 | '(' condition_expr ')' # brackets_expr
 | condition_expr K_NOT? K_LIKE condition_expr # like_expr
 | condition_expr K_NOT? K_CONTAINS condition_expr # contains_expr
 | condition_expr K_IS K_NOT? condition_expr # is_expr
 //| condition_expr K_NOT? K_BETWEEN condition_expr K_AND condition_expr
 | condition_expr K_NOT? K_IN '(' ( literal_value ( ',' literal_value )*) ')' # in_expr
 ;

filterClause
 : K_FILTER with_table K_BY condition_expr
 ;

groupByClause
 : K_GROUP K_BY group_expr ( ',' group_expr )*
 ;  

group_expr
 : literal_value
 | ( table_name '.' )? column_name
 | function_name '(' argument_list ')'
 | range_function_in_group
 | interval_function_in_group
 ;

limitClause
 : K_LIMIT NUMERIC_LITERAL
 ;

argument_list
 : ( select_expr ( ',' select_expr )* | '*' )
 ;

unary_operator
 : MINUS
 | PLUS
 | '~'
 | K_NOT
 ;

binary_operator
 : ( '*' | DIVIDE | MODULAR )
 | ( PLUS | MINUS )
 //| ( '<<' | '>>' | '&' | '|' )
 | ( LTH | LEQ | GTH | GEQ )
 | ( EQUAL | NOT_EQUAL | K_IN | K_LIKE )
 //| ( '=' | '==' | '!=' | '<>' | K_IS | K_IS K_NOT | K_IN | K_LIKE | K_GLOB | K_MATCH | K_REGEXP )
 | K_AND
 | K_OR
 ;

range_function_in_select
 : K_RANGE '(' select_expr ')'
 ;

range_function_in_group
 : K_RANGE '(' select_expr ',' range_pair (',' range_pair)* ')'
 ;

range_pair // Tried to use INT instead (for decimal numbers) but that didn't work fine (didn't parse a = 1 correctly)
 : '"' NUMERIC_LITERAL ',' NUMERIC_LITERAL '"'
 | '"' ',' NUMERIC_LITERAL '"'
 | '"' NUMERIC_LITERAL ',' '"'
 ;

interval_function_in_select
 : K_INTERVAL '(' select_expr ')'
 ;

interval_function_in_group
 : K_INTERVAL '(' select_expr ',' NUMERIC_LITERAL ')'
 ;


function_name
 : any_name
 ;

literal_value
 : NUMERIC_LITERAL
 | STRING_LITERAL
// | BLOB_LITERAL
 | K_NULL
// | K_CURRENT_TIME
// | K_CURRENT_DATE
// | K_CURRENT_TIMESTAMP
 ;

column_name 
 : any_name
 ;

column_alias
 : IDENTIFIER
 | STRING_LITERAL
 ;

SPACES
 : [ \u000B\t\r\n] -> skip
 ;

COUNTERS : 'counters' | 'COUNTERS';

//INT : '0' | DIGIT+ ; 

EQUAL  : '=';
NOT_EQUAL  : '<>' | '!=';
LTH : '<' ;
LEQ : '<=';
GTH   : '>';
GEQ   : '>=';
//MULTIPLY: '*';
DIVIDE  : '/';
MODULAR : '%';
PLUS  : '+';
MINUS : '-';

K_AND : A N D;
K_AS : A S;
K_BY : B Y;
K_CONTAINS: C O N T A I N S;
K_DISTINCT : D I S T I N C T;
K_FILTER : F I L T E R;
K_FROM : F R O M;
K_GROUP : G R O U P;
K_IFELEMENT : I F E L E M E N T;
K_IN : I N;
K_INTERVAL : I N T E R V A L;
K_IS : I S;
K_LIKE : L I K E;
K_LIMIT : L I M I T;
K_NOT : N O T;
K_NULL : N U L L;
K_OR : O R;
K_RANGE : R A N G E;
K_REGEXP : R E G E X P;
K_SELECT : S E L E C T;
K_WHERE : W H E R E;
K_WITH : W I T H;

IDENTIFIER
 : '"' (~'"' | '""')* '"'
 | '`' (~'`' | '``')* '`'
 | '[' ~']'* ']'
 | [a-zA-Z_] [.a-zA-Z_0-9]* // TODO - need to check if the period is correcly handled
 | [a-zA-Z_] [a-zA-Z_0-9]* // TODO check: needs more chars in set
 ;

STRING_LITERAL
 : '\'' ( ~'\'' | '\'\'' )* '\''
 ;

NUMERIC_LITERAL
 :// INT
 DIGIT+  ('.' DIGIT*)? ( E [-+]? DIGIT+ )?
 | '.' DIGIT+ ( E [-+]? DIGIT+ )?
 ;

fragment DIGIT : [0-9];

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

and I try to parse the following query: SELECT * from accounts where not data.zzz > 124

I get the following tree: enter image description here

But I wanted to get the tree similar to when I use parenthesis: SELECT * from accounts where not (data.zzz > 124)

enter image description here

I don't understand why it's working that way sinze the unary rule is before others.

Any suggestion?

Was it helpful?

Solution

That is the correct result for the given grammar. As you've already mentioned, the unary_operator is before the binary_operator meaning any operand for the NOT keyword is binded to it first before other operators. And since it is unary, it takes the data.zzz as its operand and after that the whole NOT expression becomes an operand of the binary_operator.

To get what you want, just shift down the unary_operator according to the precedence level of it (as I recall, in SQL, NOT's precedence is lower than that of binary operators, and the NOT operator should not have the same precedence as the MINUS PLUS and ~ just like what your grammar does) e.g.

condition_expr
: literal_value # literal
| ( table_name '.' )? column_name # column_name_expr
| condition_expr binary_operator condition_expr # binary_expr
| unary_operator condition_expr # unary_expr
| K_IFELEMENT '(' with_table ',' condition_expr ')' # if_element
| function_name '(' argument_list ')' # function_expr
| '(' condition_expr ')' # brackets_expr
| condition_expr K_NOT? K_LIKE condition_expr # like_expr
| condition_expr K_NOT? K_CONTAINS condition_expr # contains_expr
| condition_expr K_IS K_NOT? condition_expr # is_expr
//| condition_expr K_NOT? K_BETWEEN condition_expr K_AND condition_expr
| condition_expr K_NOT? K_IN '(' ( literal_value ( ',' literal_value )*) ')' # in_expr
;

And this gives what you want: enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top