Shift/Reduce Conflict in Yacc/Flex

https://stackoverflow.com/questions/16703338

30-05-2022
|

Question

I have this grammar in yacc:

%{
    #include <stdio.h>
%}

%token texto SEP ERRO word

%start Ini

%%

Ini: Directivas SEP SEP Conceitos '$'
            { printf("Terminou bem...\n"); return 0; };

Directivas: Directiva
          | Directivas SEP Directiva
          ;

Conceitos: Conceito
         | Conceitos SEP SEP Conceito
         ;

Conceito: word SEP Atributos;

Atributos: Atributo
         | Atributos SEP Atributo
         ;

Directiva: texto;
Atributo: '-' texto;

%%

int main(){
    yyparse();
}

int yyerror(char *s){
    fprintf(stderr, "%s\n", s);
}

And in flex:

%{
    #include "y.tab.h"
%}

%%

[a-zA-Z]+           return word;

[a-zA-Z ]+          return texto;

\-                  return '-';

\n                  return SEP;

[ \t]               ;

.                   return ERRO;

<<EOF>>             return '$';

I want to make a parse that valids something like:

text line
text line
text line

word
-text line
-text line
-text line

word
-text line

where the first lines are the 'Directivas' and then one blank line and then it comes the 'Conceitos' where one Conceito is one word followed by a few text lines with a '-' in the begin. those 'Conceitos are separated by one blank line

but it finds a shift/reduce conflitct.. i am new in this and i cant find out why

Sorry for my english

Thank you

Solution

Use yacc's (or bison's) -v option to get a full listing of the generated parser and the grammar conflicts in the y.output file. When you do this with your grammar, you get something like (from bison):

State 16 conflicts: 1 shift/reduce
        :
state 16

    6 Conceito: word SEP Atributos .
    8 Atributos: Atributos . SEP Atributo

    SEP  shift, and go to state 20

    SEP       [reduce using rule 6 (Conceito)]
    $default  reduce using rule 6 (Conceito)

This tells you exactly where the conflict is -- after reducing an Attributos and looking at a SEP lookahead, the parser doesn't know if it should shift the SEP to parse another Atributo after it, or to reduce the Conceito, which would only be valid if there's another SEP after the SEP (two token lookahead needed).

One way to avoid this would be to have your lexer return multiple SEPs (blank lines) as a single token:

\n      return SEP;
\n\n    return SEP_SEP;

You might want to allow whitespace on the blank line or more than a single blank line instead:

\n([ \t]*\n)+  return SEP_SEP;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow