Question

For my application, I am trying to build an SQLite parser. As my application is using Objective-C, ParseKit seems like a good option. I read SQLite's syntax diagrams and built a grammar based on them. However, when I try to parse something using this grammar, the parser goes into infinite recursion.

The only statements I need are SELECT, INSERT, UPDATE, and DELETE (I need SELECT mostly because the others refer to it). My @start is designed to handle multiple statements separated by semicolons:

@start = statement (';' statement)*;

statement = select_stmt | insert_stmt | update_stmt | delete_stmt ;

The statements are as follows:

select_stmt = select_core ( compound_operator select_core )* order_expr? limit_expr? ;
select_core = 'select' ( 'distinct' | 'all' )? result_column ( ',' result_column )* from_expr where_expr group_expr ;
result_column = '*' | table_name '.' '*' | expr ( 'as'? column_alias )? ;

insert_stmt = 'insert' or_on_failure? 'into' table_name insert_expr ;
insert_expr = insert_expr_cols? (insert_expr_values | select_stmt) ;
insert_expr_cols = '(' column_name ( ',' column_name )* ')' ;
insert_expr_values = 'values' '(' expr ( ',' expr )* ')' ;
insert_expr_defaults = 'default' 'values' ;

update_stmt = 'update' or_on_failure? qualified_table_name update_expr where_expr? limited_expr? ;
update_expr = 'set' update_expr_col ( ',' update_expr_col )* ;
update_expr_col = column_name '=' expr ;

delete_stmt = 'delete' 'from' qualified_table_name where_expr? limited_expr? ;

And their supporting expressions:

order_expr = 'order' 'by' ordering_term (',' ordering_term)* ;
limit_expr = 'limit' expr ( ( 'offset' | ',' ) expr )? ;
from_expr = 'from' join_source ;
where_expr = 'where' expr ;
group_expr = 'group' 'by' expr ( ',' expr )? ( 'having' expr )? ;

join_source = single_source ( join_operator single_source join_constraint? )* ;
single_source = table_name 'as' table_alias indexed_by? | '(' select_stmt ')' ( 'as'? table_alias )? | '(' join_source ')' ;

or_on_failure = 'or' on_failure ;
on_failure = 'rollback' | 'abort' | 'replace' | 'fail' | 'ignore' ;
limited_expr = order_expr limit_expr ;

Names, aliases:

database_name = name ;
table_name = (database_name '.')? name ;
column_name = (table_name '.')? name ;

table_alias = name ;
column_alias = name ;

index_name = name ;

type_name = name+ ( '(' number (',' number)? ')' )? ;
function_name = name ;
collation_name = name ;

qualified_table_name = table_name indexed_by? ;

Miscellaneous operators, etc:

indexed_by = 'indexed' 'by' index_name | 'not' 'indexed' ;

unary_operator = symbol ;
binary_operator = symbol ;
compound_operator = 'union' 'all'? | 'intersect' | 'except' ;
join_operator = ',' | 'natural'? ( 'left' 'outer'? | 'inner' | 'cross' ) 'join' ;

join_constraint = 'on' expr | 'using' '(' column_name ( ',' column_name )* ')' ;

Basic types:

literal = number
    | string
    | 'null'
    | 'current_time'
    | 'current_data'
    | 'current_timestamp' ;
number = Number ;
string = Word
    | QuotedString ;
name = Word
    | QuotedString ;
symbol = Symbol;

And the EXPR:

expr = literal
    | column_name
    | unary_operator expr
    | expr binary_operator expr
    | function_name '(' ( '*' | 'distinct'? expr ( ',' expr )* )? ')'
    | '(' expr ')'
    | 'cast' '(' expr 'as' type_name ')'
    | expr 'collate' collation_name
    | expr 'not'? ( 'like' | 'glob' | 'regexp' | 'match' ) expr ( 'escape' expr )?
    | expr ( 'isnull' | 'notnull' | 'not' 'null' )
    | expr 'is' 'not'? expr
    | expr 'not'? 'between' expr 'and' expr
    | expr 'not'? 'in' ( table_name | '(' ( select_stmt | expr ( ',' expr )* )? ')' )
    | ( 'not'? 'exists' )? '(' select_stmt ')'
    | 'case' expr? ( 'when' expr 'then' expr )+ ( 'else' expr )? 'end' ;

When I stepped through the code, the basic path was @start -> statement -> select_stmt -> select_core -> result_column -> expr -> expr -> expr...

After about 8-9k calls between PKParser's matchAndAssemble: and PKParser/Subclass's allMatchesFor:, something dies, usually from a EXC_BAD_ACCESS error (and then LLDB can't do anything either).

P.S.: If you're going to post an answer saying, 'Oh, you should really do/use this,' A) I like Objective-C. Don't tell me not to use it. It's my choice. My response will likely be a rant. And B) I tried digging through SQLite's source to use their parser. I never got anywhere. If you think I should use that, kindly post the source for their parser as a single file with no other dependencies.

Was it helpful?

Solution

Developer of ParseKit here.

First, see my previous answers on debugging ParseKit grammars and battling infinite recursion in ParseKit grammars.


I think there might be an issue in the very first line (but I'm not a SQL expert, so I'm not sure). Shouldn't that be:

@start = (statement ';')+;

I would strongly recommend using camel case instead of underscores, as underscores will make your Objective-C callbacks very awkward and ugly. That is why camel case is the convention in ParseKit grammars.


However, I see the main issue. Your grammar includes Left Recursion which is not allowed in ParseKit grammars. Particularly in your expr prodcution (I haven't looked closely to see if it is elsewhere too).

ParseKit is fine with recursion, but not left recusion. The best explanation of this is in Steven Metsker's "Building Parsers with Java". Or search the web.

But basically left recursion is when a production immediately references itself (on the left side of an expression):

e = e '-' Number;

or

e = Number | e '-' Number;

Instead, you must design your grammars to remove left recursion like:

e = Number ('-' Number)*;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top