Question

I am writing a bison+flex parser for a certain query language and I need to add a C-style cast operator to it. Here is the relevant part of the code:

%token <characterToken>  Identifier
%token <commandToken>    LRPAR RRPAR

%type <characterToken>   typename
%type <operationValue>   generalExp castExp variable

%%
generalExp: variable
      | LRPAR generalExp RRPAR
       { /* some code here */ }
      | castExp
      ;

castExp: LRPAR typename RRPAR generalExp
   { /* some code here */ }
   ;

variable: Identifier 
    { /* some code here*/ };
typename: Identifier;
%%

The problem here is that (typename) in castExp clashes with (variable) in generalExp and gives a reduce/reduce conflict:

test.yy: conflicts: 1 reduce/reduce
test.yy:23.11-20: warning: rule useless in parser due to conflicts: typename: Identifier

Possible solution could be to list all valid typenames (like long, int, char) in the corresponding lex file, however I need to support used-defined types as well.

UPD: another solution is to use bison GLR-parser, which I don't want due to performance drop.

bison -v output is here.

Was it helpful?

Solution

Yes, the C cast syntax was very badly chosen. It is impossible to resolve correctly without knowing whether what is inside the cast is a type or a value. Here are some cases in C:

(f)*a    // cast the value pointed to by a to type f, or multiply f by a?
(f)(a+b) // cast a+b to type f or call the function f with argument a+b?

The easiest way I've found to handle this is to let the lexer consult the symbol table, so that it can return different token types for Typenames and Identifiers. (Which is pretty well the way the C grammar works.) Sharing the symbol table between the lexer and the parser is a bit ugly, particularly if you allow scoped type definitions, but it can be done. For the purposes of correct lexing, the only thing that the lexer should need to know is whether the symbol is the name of a type or not; you can probably get away with assuming that a symbol is not a type unless it has been explicitly declared a type, so that it is not strictly necessary to insist that all symbols be declared before use. (In C++, for example, a non-type class member does not have to be declared before use, and that can still work.)

Having said all that, I'd strongly encourage you to think seriously about using a different syntax for casts, if at all possible. Aside from being a pain for parsers, they can also be annoyingly ambiguous for human readers (see the examples above), and casting is not (or should not be) so common that it requires an abbreviated syntax.

C++-style casts -- cast<TYPE>(value) -- suffer from the ambiguity introduced by recycling a comparison operator as a bracket, but are tolerable if the operations using angle brackets are a fixed set of keywords. (Otherwise, you create another context where the lexer needs to distinguish between name kinds.)

Personally, particularly for a query language, I'd go with something like

value as type

as in

(3 + 4) as double

which is (IMO) more readable and much easier to parse without being harder to type.

OTHER TIPS

Best guess -- some other part of your grammar that you don't show allows for two consecutive generalExp with no token between them. So when you have an input like ( Identifier ) expression, it might be either a cast or two consecutive generalExps.

The way to proceed is the same as with all conflicts -- run bison with the -v option to get a .output file showing all the states and exactly which states have what conflicts on which symbols. Once you know that, you have a chance of figuring out what can be done.

I had similar problem - colision of following rules:

expr: expr "+" expr
expr: "(" type ")" expr

I solved it by setting precedence of operators:

%precedence "+"
%precedence ")"

This set higher precedence to ")", so the second rule will be prefered.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top