Antlr3 next available tokens when parsing incomplete statement

https://stackoverflow.com/questions/19468695

01-07-2022
|

Question

I an trying to implement simple auto complete for a command line client for SQL. I am using antlr for generating a parser in the rest of the application and I wanted to reuse the grammar to use autocompletion. My idea is: - Parse the incomplete statement when the user asks for completion (for example select a from) - Get from the parser the list of tokens which were expected when he raised a NoViableAltException

I wanted to then do from this list of token: if (isreserved_word) { propose for completion} else { notify the user a identifier is expected}

This in principle looked like a sensible idea (to me at least) and I found this: http://www.antlr.org/wiki/pages/viewpage.action?pageId=11567208 which convinced me it was possible

However, after doing some testing I realized not that many tokens were in state.following[state._fsp] for example for an entry of create it was only containing ';' when my grammar for this part looks like:

root : statement? (SEMICOLON!)? EOF!;
statement : create | ...;
create : CREATE | ( TABLE table_create | USER user_create | ....);

So I was confused and looked at the generated code:

    try {
        int alt6=16;
        alt6 = dfa6.predict(input);
        switch (alt6) {
            case 1 :
                {
                root_0 = (CommonTree)adaptor.nil();

                pushFollow(FOLLOW_create_in_statement1088);
                create8=create();

                state._fsp--;

                adaptor.addChild(root_0, create8.getTree());

                }
                break;
            case 2 :
            ...

So then it made sense to me: The parser tries to read the next token and then from this token finds(switch case) the next rule. In my case the predict just fails as there no next token. So from there I understood I would need to hack a bit antlr and looked in the templates and in Java.stg I found these pieces of code:

/** A (...) subrule with multiple alternatives */
block(alts,decls,decision,enclosingBlockLevel,blockLevel,decisionNumber,maxK,maxAlt,description) ::= <<
// <fileName>:<description>
int alt<decisionNumber>=<maxAlt>;
<decls>
<@predecision()>
<decision>
<@postdecision()>
<@prebranch()>
switch (alt<decisionNumber>) {
    <alts:{a | <altSwitchCase(i,a)>}>
}
<@postbranch()>
>>

and

/** A case in a switch that jumps to an alternative given the alternative
 *  number.  A DFA predicts the alternative and then a simple switch
 *  does the jump to the code that actually matches that alternative.
 */
 altSwitchCase(altNum,alt) ::= <<
 case <altNum> :
   <@prealt()>
   <alt>
   break;<\n>
 >>

From there all I thought I had to do was to do my own function that would just put all altNum in a stack before the call to predict. so I did to try: /* Yout }>*/

And I was expecting to get nice little lists of token ids. But not at all I get really different things.

So I'm really lost and would like to know if either there was an easier way to provide this autocomplete feature without having to do it by hand or how am I missing to modify the template to add a custom stack to add the different alternatives in a rule so I can read it after when the exception is raised

Thank you very much

Solution

Sorry to say this but: don't use a parser directly for auto completion. There are several reasons why this won't work as you expect it, without massive manual changes in the generated parser (which requires intimate knowledge):

You often have incomplete input and unless you have only a simple language you will often find yourself in an unexpected rule path because of the backtracking nature of the parser. For instance, if you have several alts in a rule where the first alt would match if only an additional token were available the parser will fail not before it has tried all other alts giving you either completely different tokens or many more tokens than are really necessary.
The follow set is only available in an error case. However there might be no error or there is an error but at a completely different position than where the caret is currently (and where the user would expect an auto completion box).
The follow set suffices only for a small set of info you want to present (namely the keywords). However, usually you want to show, say, possible tables in a database if you are in a FROM clause (assuming an SQL language here). You will not get this type of info from the parser, simply because the parser does not have such context information. What you however get is 'identifier' which can be anything from a table, function name, variable or similar.

My current approach for this type of problem is to tokenize the input and apply domain knowledge in a decision tree. That is, I walk the input tokens and decide based on knowledge I have from the grammar what would be the most important stuff to show.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow