Question

I have been tasked with a project that involves me taking a Grammar (in BNF form) and creating a lexical scanner (using lex) and a parser (using bison). I've never worked with any of these programs and I think a good reference would be to see how these items are created from a grammar. I am looking for a grammar and it's associated .l and .ypp files, preferably in C++. I've been able to find sample files or sample grammars, but not both of them. I've spent some time searching and I could not find anything. I figure I'd post here in hopes that someone has something for me, but I will continue searching in the meantime.

I am currently reading Tom Niemann's http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf which seems to be pretty well written and understandable.

Thanks

Edit: I am still searching, I am starting to think that what I am looking for does not exist. Google usually never fails me!

Edit 2: Maybe if I provide some of the grammar, you folks could show me what the appropriate .l and .ypp files would look like. This is just a snippet of the grammar, I just need a little 'taste' of how this works and I think I can take it from there.

Grammar:

Program ::= Compound
Statements ::= Compound | Assignment | ...
Assignment ::= Var ASSIGN Expression
Expression ::= Var | Operator Expression Expression | Number
Compound := START Statements END
Number ::= NUMBER

Descriptions:

Assignment is the equal sign ":="

Var is an identifier that begins with a lower case letter and is followed by lower case letters or digits

START is the "start" keyword

END is the "end keyword

Operator is "+", "-", "*", "/"

Number is decimal digits which could potentially be negative (minus sign in front)
Was it helpful?

Solution

Most of this is fairly straightforward. One part, however, is decidedly problematic. You've defined a number to (potentially) include a leading -, and that's a problem.

The problem is pretty simple. Given an input like 321-123, it's essentially impossible for the lexer (which won't normally keep track of current state) to guess at whether that's supposed to be two tokens (321 and -123 or three 321, -, 123). In this case, the - is almost certainly intended to be separate from the 123, but if the input were 321 + -123 you'd apparently want -123 as a single token instead.

To deal with that, you probably want to change your grammar so the leading - isn't part of the number. Instead, you always want to treat the - as an operator, and the number itself is composed solely of the digits. Then it's up to the parser to sort out expressions where the - is unary vs. binary.

Taking that into account, the lexer file would look something like this:

%{
#include "y.tab.h"
%}

%option noyywrap case-insensitive  
%%

:=        { return ASSIGN;   }
start     { return START;    }
end       { return END;      }
[+/*]     { return OPERATOR; }
-         { return MINUS;    }
[0-9]+    { return NUMBER;   }
[a-z][a-z0-9]* { return VAR; }
[ \r\n]   { ; }

%%

void yyerror(char const *s) { fputs(s, stderr); }

The matching yacc file would look something like this:

%token ASSIGN START END OPERATOR MINUS NUMBER VAR
%left '-' '+' '*' '/'
%%

program : compound

statement : compound
            | assignment
            ;

assignment : VAR ASSIGN expression
            ;

statements :
            | statements statement
            ;

expression : VAR
            | expression OPERATOR expression 
            | expression MINUS expression
            | value
            ;

value: NUMBER
     | MINUS NUMBER
     ;          

compound : START statements END

%%

int main() {
    yyparse();
    return 0;
}

Note: I've tested these only extremely minimally--enough to verify input I believe is grammatical, such as: start a:=1 b:=2 end and start a:=1+3*3 b:=a+4 c:=b*3 end is accepted (no error message printed out) and input I believe is un-grammatical, such as: 9:=13 and a=13 do both print out syntax error messages. Since this doesn't attempt to do any more with the expressions than recognize those which are or are not grammatical, that's about the best we can do though.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top