سؤال

I currently started studying YACC. I just want to know how to write an attributed grammar in YACC. Please give an example. Can we use union ?

هل كانت مفيدة؟

المحلول

Yes, you can associate attributes with the nodes in the parse tree. Each node has a single attribute, the "dollar specifier". Here is an example, where the attribute is used for the value of an expression:

expression : expression '+' expression { $$ = $1 + $3; }
           | expression '-' expression { $$ = $1 - $3; }
           | NUMBER { $$ = $1; }
           ;

By default, this single attribute is an integer, but you can change the type using the %union directive. Different types of nodes can have different types of attributes. (This is why it's called %union and not something like %type.) If you need more than one attribute, you can use a C struct or struct pointer as the type.

نصائح أخرى

Yes, you need to use some C++ features: for AST tree storage use "symbolic" class set, based on Sym abstract class. Virtual inheritance gives you ability to use Sym* pointers and dynamic_cast<Num*>(o) for STL containers and object manipulations. Use attr{} map for grammar attributes.

Full (c)lexical program skeleton(c) source tree see at https://github.com/ponyatov/uc/tree/master/ast

struct Sym {        // universal algebraic symbolic type, struct applies public
    string tag;         // class/type marker, required for token elements
    string val;         // value, string is universal can represent _any_ data

    Sym(string T,string V);     // <T:V> pair constructor
    Sym(string V);              // token constructor

    vector<Sym*> nest;          // \ nest[]ed elements for AST tree
    void push(Sym*o);           // / push new nested element

    map<string,Sym*> attr;      // \ named dynamic attributes,
    double num ;                // / or extra attributes you need

    virtual string head();      // return "<T:V>" pair repr
    string pad(int);            // pad output of tree elements
    string dump(int depth=0);   // return tree dump
};

struct Num: Sym { Num(string);      // number tokens
    double val; string head(); };
struct Str: Sym { Str(string);      // 'string' tokens
    string head(); };

struct Vector: Sym { Vector(); };   // [vector]
struct Op: Sym { Op(string); };     // operator

typedef Sym*(*FN)(Sym*);                        // \ primitive function
struct Fn: Sym { Fn(string V, FN F); FN fn; };  // /

                                // == lexer interface ==
extern int yylex();             // get next token
extern int yylineno;            // line number
extern char* yytext;            // parsed lexeme (ASCIIZ string)
#define TOC(C,X) { yylval.o = new C(yytext); return X; } // gen.token macro
                                // == syntax parser interface ==
extern int yyparse();           // grammar parser
extern void yyerror(string);    // error callback function
#include "ypp.tab.hpp"          // shared lex/yacc token definitions

Note lexer interface macro used in lpp.lpp for token construction in form

[0-9]+(\.[0-9]*)?([eE](\+\-)?[0-9]+)?   TOC(Num,NUM)    /* number */

For answer see above grammar must be described as

%defines %union { Sym*o; }
%token <o> NUMBER ADD SUB
%type <o> expression

expression : expression ADD expression {
    // build AST node
    $$=$2; $$->push($1); $$->push($3);
    // synth .num attribute from nested nodes
    $$->num = $1->num + $3->num ;
    }

expression : expression SUB expression {
    // build AST node
    $$=$2; $$->push($1); $$->push($3);
    // synth .num from nested nodes
    $$->num  = $1->num - $3->num ;
    }

expression : NUMBER { $$=$1; } /* terminal should be used directly */

or if you want true symbolic way: this yacc grammar will do dynamic synthesable attributes in attr{} (much cryptic in C++, but looks clear in Python+PLY syntax)

%%
REPL : | REPL ex { cout << $2->dump() << endl; } ;

ex : SYM                { $$=$1; /* terminal as is */ } ;
ex : NUM                { $$=$1; /* terminal as is */
    // synth
    $$->attr["num"] = new Num(dynamic_cast<Num*>($1)->val);
    } ;

ex : SYM LP ex RP       { $$=new Op("@");               // apply operator
                          $$->push(new Fn($1->val));    // new function
                          $$->push($3);                 // parameters
    // synth
    if ($1->val=="sin")
        $$->attr["num"] = new Num(std::sin(\
            dynamic_cast<Num*>($3->attr["num"])->val));
    } ;

ex : LP ex RP           { $$=$2; /* as is */ } ; // must be after SYM(ex)

ex : ex DIV ex          { $$=$2; $$->push($1); $$->push($3);
   $$->attr["num"] = new Num(\
    dynamic_cast<Num*>($1->attr["num"])->val \
    / \
    dynamic_cast<Num*>($3->attr["num"])->val \
    );
   } ;

gives tree

<op:=> #0x5b1180
    <sym:A> #0x5b1118
    <op:+> #0x5b1348
        <op:-> #0x5b11e8
            1 #0x5b1250
                num =
                    1 #0x5b12a8
        <op:*> #0x4a07d8
            <op:+> #0x5b13b0
                2.3 #0x5b1418
                    num =
                        2.3 #0x5b1470
            <op:^> #0x4a1090
                4e-005 #0x4a1010
                    num =
                        4e-005 #0x4a1050
                <op:/> #0x5bb730
                    num =
                        -0.0399165 #0x5bb850
                    <op:@> #0x5bb648
                        num =
                            -0.279415 #0x5bb6d0
                        <fn:sin> #0x5bb680
                        6 #0x5bb570
                            num =
                                6 #0x5bb5b0
                    7 #0x5bb768
                        num =
                            7 #0x5bb7a8

(*) to answeres: please note attribute grammar keyword in quiestion.

There are lots of examples of yacc grammars around the web. A simple google search for yacc example brings up lots of links. There's also a bunch of useful links here

I you prefer simplest way, yes, use this grammar: : @https://github.com/ponyatov/uc/blob/master/ast/union.yacc

This variant still build annotated AST tree, but attributes hardcoded into classes. Use virtual inheritance if you need extra attributes, and track attributes tree validity by hands (by production rules).

%defines %union {
    struct AST {
        string name;
        double value;
        virtual string dump(int depth=0);
        vector<AST*> nest; void push(Sym*);
    } *node;
}

/* tokens name/value must be filled in lexer */

%token <node> SYM NUM EQ ADD SUB MUL DIV POW LP RP
%type <node> ex  
                    // precedence down higher
%right EQ
%left ADD SUB
%left MUL DIV
%right PFX

%%
REPL : | REPL ex    { cout << $2->dump() << endl } ;

ex : SYM            { $$=$1; } ; // token as is
ex : NUM            { $$=$1; } ; // token as is

ex : ADD ex %prec PFX {
    $$=$1; $$->push($2);                // unary operator AST subtree
    $$->value = + $2->value;            // + A
    };
ex : SUB ex %prec PFX {
    $$=$1; $$->push($2);                // unary operator AST subtree
    $$->value = - $2->value;            // - A
    };

ex : ex ADD ex      {
    $$=$2; $$->push($1); $$->push($3);  // build AST subtree
    $$->value = $1->value + $2->value;  // synth attr without cryptic code
    } ;

ex : ex MUL ex      {
    $$=$2; $$->push($1); $$->push($3);  // build AST subtree
    $$->value = $1->value * $2->value;  // synth attr without cryptic code
    } ;

Really, yacc-ng should support %struct option to build AST trees directly this like, but yacc can't do this, and you should nest struct* in %union.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top