Is there a way to decrease the number of alternatives in this reduction while keeping it LALR(1)?

StackOverflow https://stackoverflow.com/questions/12604616

  •  04-07-2021
  •  | 
  •  

Question

I currently have a reduction in bison which is meant to match a comma-separated list with several optional different rules:

arg_list
    :
    | expr_list
    | assignment_list
    | expr_list ',' assignment_list
    | varargs
    | expr_list ',' varargs
    | assignment_list ',' varargs
    | expr_list ',' assignment_list ',' varargs
    | varkwdargs
    | expr_list ',' varkwdargs
    | assignment_list ',' varkwdargs
    | expr_list ',' assignment_list ',' varkwdargs
    | varargs ',' varkwdargs
    | expr_list ',' varargs ',' varkwdargs
    | assignment_list ',' varargs ',' varkwdargs
    | expr_list ',' assignment_list ',' varargs ',' varkwdargs
    ;

In case it is unclear, this is meant to implement (in pseudocode):

arg_list
    :
    | expr_list [',' assignment_list] [',' varargs] [',' varkwdargs]
    | assignment_list [',' varargs] [',' varkwdargs]
    | varargs [',' varkwdargs]
    | varkwdargs
    ;

The way to do this would be to implement, for example:

optional_assignment_list:
    :
    | ',' assignment_list
    ;

However, the latter formulation is not LALR(1), since at each step where Bison sees a comma, it has to decide whether to, for example, shift and look for an *assignment_list* or reduce an empty *optional_assignment_list* and move on to look for an *optional_varargs*.

I am trying to find if there is a better way to represent this. I have been able to decrease the number of alternatives by introducing *optional_varkwdargs*, but that still leaves nine alternatives to the reduction, which I guess is better than 16:

optional_varkwdargs
    :
    | ',' varkwdargs
    ;

arg_list
    :
    | expr_list optional_varkwdargs
    | assignment_list optional_varkwdargs
    | expr_list ',' assignment_list optional_varkwdargs
    | varargs optional_varkwdargs
    | expr_list ',' varargs optional_varkwdargs
    | assignment_list ',' varargs optional_varkwdargs
    | expr_list ',' assignment_list ',' varargs optional_varkwdargs
    | varkwdargs
    ;

Any ideas would be appreciated.

Was it helpful?

Solution

Well, one possibility is to have a more general arglist parser that allows lists of any of these things in any order:

arg_list: arg_item 
          { $$ = CreateSingletonArgList($1); }
        | arg_list ',' arg_item
          { if (CheckArgListOrdering($1, $3))
                $$ = AppendArgList($1, $3);
            else
                $$ = $1; }
        ;

arg_item: expr
        | assignment
        | vararg
        | varkwarg
        ;

You then have a post-parse check on an arg_list to ensure that things don't come in the wrong order. This has the advantage that you can give better error messages for these cases ("kwarg must be after vararg" or some such) rather than just a generic "syntax error"

edit

With the above actions in the rules, the CheckArgListOrdering function checks to make sure an arg_item can go after all of those in an existing arg_list and issue an appropriate error message if not. The return value tells the parser whether to accept the new arg_item or throw it away. The Create/Append functions just manage arg_lists as any kind of ordered collection data structure (might be a linked list, or an array, or anything else appropriate).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top