Most of this is fairly straightforward. One part, however, is decidedly problematic. You've defined a number to (potentially) include a leading -
, and that's a problem.
The problem is pretty simple. Given an input like 321-123
, it's essentially impossible for the lexer (which won't normally keep track of current state) to guess at whether that's supposed to be two tokens (321
and -123
or three 321
, -
, 123
). In this case, the -
is almost certainly intended to be separate from the 123
, but if the input were 321 + -123
you'd apparently want -123
as a single token instead.
To deal with that, you probably want to change your grammar so the leading -
isn't part of the number. Instead, you always want to treat the -
as an operator, and the number itself is composed solely of the digits. Then it's up to the parser to sort out expressions where the -
is unary vs. binary.
Taking that into account, the lexer file would look something like this:
%{
#include "y.tab.h"
%}
%option noyywrap case-insensitive
%%
:= { return ASSIGN; }
start { return START; }
end { return END; }
[+/*] { return OPERATOR; }
- { return MINUS; }
[0-9]+ { return NUMBER; }
[a-z][a-z0-9]* { return VAR; }
[ \r\n] { ; }
%%
void yyerror(char const *s) { fputs(s, stderr); }
The matching yacc file would look something like this:
%token ASSIGN START END OPERATOR MINUS NUMBER VAR
%left '-' '+' '*' '/'
%%
program : compound
statement : compound
| assignment
;
assignment : VAR ASSIGN expression
;
statements :
| statements statement
;
expression : VAR
| expression OPERATOR expression
| expression MINUS expression
| value
;
value: NUMBER
| MINUS NUMBER
;
compound : START statements END
%%
int main() {
yyparse();
return 0;
}
Note: I've tested these only extremely minimally--enough to verify input I believe is grammatical, such as: start a:=1 b:=2 end
and start a:=1+3*3 b:=a+4 c:=b*3 end
is accepted (no error message printed out) and input I believe is un-grammatical, such as: 9:=13
and a=13
do both print out syntax error
messages. Since this doesn't attempt to do any more with the expressions than recognize those which are or are not grammatical, that's about the best we can do though.