Question

I am currently writing a parser with yecc in Erlang.

Nonterminals expression.

Terminals '{' '}'  '+' '*' 'atom' 'app' 'integer' 'if0' 'fun' 'rec'.

Rootsymbol expression.

expression -> '{' '+' expression  expression '}' : {'AddExpression', '$3','$4'}.
expression -> '{' 'if0' expression expression expression '}' : {'if0', '$3', '$4', '$5'}.
expression -> '{' '*' expression expression '}' : {'MultExpression', '$3','$4'}.
expression -> '{' 'app' expression expression '}' : {'AppExpression', '$3','$4'}.
expression -> '{' 'fun' '{' expression '}' expression '}': {'FunExpression', '$4', '$6'}.
expression -> '{' 'rec' '{' expression expression '}' expression '}' : {'RecExpression', '$4', '$5', '$7'}.
expression -> atom : '$1'.
expression -> integer : '$1'.

I also have an erlang project that tokenizes the the input before parsing:

tok(X) ->
element(2, erl_scan:string(X)).

get_Value(X)->
 element(2, parse(tok(X))).

These cases are accepted:

interp:get_Value("{+ {+ 4 6} 6}").
interp:get_Value("{+ 4 2}"). 

These return: {'AddExpression' {'AddExpression' {integer, 1,6} {integer,1,6}}{integer,1,6}} and {'AddExpression' {integer,1,4} {integer,1,2}}

But this test case:

interp:get_Value("{if0 3 4 5}").

Returns:

{1,string_parser,["syntax error before: ","if0"]}
Was it helpful?

Solution

In the grammar rules what you are showing are the category of the terminal tokens and not their values. So you can match against an atom but not against a specific atom. If you are using the Erlang tokenizer then the token generated for "if0" will be {atom,Line,if0} while in you grammar you want a {if0,Line} token. This is what the "Pre-processing" section of the yecc documentation is trying to explain.

You will need a special tokenizer for this. A simple way of handling this if you want to use the Erlang tokenizer is have a pre-processing pass which scans the token list and converts {atom,Line,if0} tokens to {if0,Line} tokens.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top