byacc shift/reduce

https://stackoverflow.com/questions/10179679

01-06-2021
|

문제

I'm having trouble figuring this one out as well as the shift reduce problem.

Adding ';' to the end doesn't solve the problem since I can't change the language, it needs to go just as the following example. Does any prec operand work?

The example is the following:

A variable can be declared as: as a pointer or int as integer, so, both of this are valid:

<int> a = 0
int a = 1

the code goes:

%left '<'

declaration: variable
           | declaration variable

variable : type tNAME '=' expr
         | type tNAME

type : '<' type '>'
     | tINT

expr : tINTEGER
     | expr '<' expr

It obviously gives a shift/reduce problem afer expr. since it can shift for expr of "less" operator or reduce for another variable declaration.

I want precedence to be given on variable declaration, and have tried to create a %nonassoc prec_aux and put after '<' type '>' %prec prec_aux and after type tNAME but it doesn't solve my problem :S

How can I solve this?

Output was:

Well cant figure hwo to post linebreaks and code on reply... so here it goes the output:

35: shift/reduce conflict (shift 47, reduce 7) on '<'
state 35
    variable : type tNAME '=' expr .  (7)
    expr : expr . '+' expr  (26)
    expr : expr . '-' expr  (27)
    expr : expr . '*' expr  (28)
    expr : expr . '/' expr  (29)
    expr : expr . '%' expr  (30)
    expr : expr . '<' expr  (31)
    expr : expr . '>' expr  (32)

    '>'  shift 46
    '<'  shift 47
    '+'  shift 48
    '-'  shift 49
    '*'  shift 50
    '/'  shift 51
    '%'  shift 52
    $end  reduce 7
    tINT  reduce 7

Thats the output and the error seems the one I mentioned.

Does anyone know a different solution, other than adding a new terminal to the language that isn't really an option?

I think the resolution is to rewrite the grammar so it can lookahead somehow and see if its a type or expr after the '<' but I'm not seeing how to.

Precedence is unlikely to work since its the same character. Is there a way to give precendence for types that we define? such as declaration?

Thanks in advance

해결책

Here is the fix, but not entirely satisfactory:

%{
%}

%token tNAME tINT tINTEGER

%left '<'
%left '+'
%nonassoc '=' /* <-- LOOK */

%%

declaration: variable
           | declaration variable

variable : type tNAME '=' expr
         | type tNAME

type : '<' type '>'
     | tINT

expr : tINTEGER
     | expr '<' expr
     | expr '+' expr
     ;

This issue is a conflict between these two LR items: the dot-final:

variable : type tNAME '=' expr_no_less .

and this one:

expr : expr . '<' expr

Notice that these two have different operators. It is not, as you seem to think, a conflict between different productions involving the '<' operator.

By adding = to the precedence ranking, we fix the issue in the sense that the conflict diagnostic goes away.

Note that I gave = a high precedence. This will resolve the conflict by favoring the reduce. This means that you cannot use a '<' expression as an initializer:

int name = 4 < 3    // syntax error

When the < is seen, the int name = 4 wants to be reduced, and the idea is that < must be the start of the next declaration, as part of a type production.

To allow < relational expressions to be used as initializers, add the support for parentheses into the expression grammar. Then users can parenthesize:

int foo = (4 < 3) <int> bar = (2 < 1)

There is no way to fix that without a more powerful parsing method or hacks.

What if you move the %nonassoc before %left '<', giving it low precedence? Then the shift will be favored. Unfortunately, that has the consequence that you cannot write another <int> declaration after a declaration.

int foo = 3 <int> bar = 4
             ^ // error: the machine shifted and is now doing:   expr '<' . expr.

So that is the wrong way to resolve the conflict; you want to be able to write multiple such declarations.

Another Note:

My TXR language, which implements something equivalent to Parse Expression Grammars handles this grammar fine. This is essentially LL(infinite), which trumps LALR(1).

We don't even have to have a separate lexical analyzer and parser! That's just something made necessary by the limitations of one-symbol-lookahead, and the need for utmost efficiency on 1970's hardware.

Example output from shell command line, demonstrating the parse by translation to a Lisp-like abstract syntax tree, which is bound to the variable dl (declaration list). So this is complete with semantic actions, yielding an output that can be further processed in TXR Lisp. Identifiers are translated to Lisp symbols via calls to intern and numbers are translated to number objects also.

$ txr -l type.txr  -
int x = 3 < 4 int y
(dl (decl x int (< 3 4)) (decl y int nil))

$ txr -l type.txr  -
< int > x = 3 < 4 < int > y
(dl (decl x (pointer int) (< 3 4)) (decl y (pointer int) nil))

$ txr -l type.txr  -
int x = 3 + 4 < 9 < int > y < int > z = 4 + 3 int w
(dl (decl x int (+ 3 (< 4 9))) (decl y (pointer int) nil) 
 (decl z (pointer int) (+ 4 3)) (decl w int nil))

$ txr -l type.txr  -
<<<int>>>x=42  
(dl (decl x (pointer (pointer (pointer int))) 42))

The source code of (type.txr):

@(define ws)@/[ \t]*/@(end)
@(define int)@(ws)int@(ws)@(end)
@(define num (n))@(ws)@{n /[0-9]+/}@(ws)@(filter :tonumber n)@(end)
@(define id (id))@\
   @(ws)@{id /[A-Za-z_][A-Za-z_0-9]*/}@(ws)@\
   @(set id @(intern id))@\
@(end)
@(define type (ty))@\
  @(local l)@\
  @(cases)@\
    @(int)@\
    @(bind ty @(progn 'int))@\
  @(or)@\
    <@(type l)>@\
    @(bind ty @(progn '(pointer ,l)))@\
  @(end)@\
@(end)
@(define expr (e))@\
  @(local e1 op e2)@\
  @(cases)@\
    @(additive e1)@{op /[<>]/}@(expr e2)@\
    @(bind e @(progn '(,(intern op) ,e1 ,e2)))@\
  @(or)@\
    @(additive e)@\
  @(end)@\
@(end)
@(define additive (e))@\
  @(local e1 op e2)@\
  @(cases)@\
    @(num e1)@{op /[+\-]/}@(expr e2)@\
    @(bind e @(progn '(,(intern op) ,e1 ,e2)))@\
  @(or)@\
    @(num e)@\
  @(end)@\
@(end)
@(define decl (d))@\
  @(local type id expr)@\
  @(type type)@(id id)@\
  @(maybe)=@(expr expr)@(or)@(bind expr nil)@(end)@\
  @(bind d @(progn '(decl ,id ,type ,expr)))@\
@(end)
@(define decls (dl))@\
  @(coll :gap 0)@(decl dl)@(end)@\
@(end)
@(freeform)
@(decls dl)

다른 팁

Your grammar gets confused in text like this:

int a = b
<int> c

That '<' on the second line could be part of an expression in the first declaration. It would have to look ahead further to find out.

This is the reason most languages have a statement terminator. This produces no conflicts:

%%

%token tNAME;
%token tINT;
%token tINTEGER;
%token tTERM;

%left '<';

declaration: variable
           | declaration variable

variable : type tNAME '=' expr tTERM
         | type tNAME tTERM

type : '<' type '>'
     | tINT

expr : tINTEGER
     | expr '<' expr

It helps when creating a parser to know how to design a grammar to eliminate possible conflicts. For that you would need an understanding of how parsers work, which is outside the scope of this answer :)

The basic problem here is that you need more lookahead than the 1 token you get with yacc/bison. When the parser sees a < it has no way of telling whether its done with the preivous declaration and its looking at the beginning of a bracketed type, or if this is a less-than operator. There's two basic things you can do here:

Use a parsing method such as bison's %glr-parser option or btyacc, which can deal with non-LR(1) grammars
Use the lexer to do extra lookahead and return disambiguating tokens

For the latter, you would have the lexer do extra lookahead after a '<' and return a different token if its followed by something that looks like a type. The easiest is to use flex's / lookahead operator. For example:

"<"/[ \t\n\r]*"<"    return OPEN_ANGLE;
"<"/[ \t\n\r]*"int"  return OPEN_ANGLE;
"<"                  return '<';

Then you change your bison rules to expect OPEN_ANGLE in types instead of <:

type : OPEN_ANGLE type '>'
     | tINT

expr : tINTEGER
     | expr '<' expr

For more complex problems, you can use flex start states, or even insert an entire token filter/transform pass between the lexer and the parser.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow