Grammar involving braces

https://stackoverflow.com/questions/7543100

29-01-2021
|

Question

I'm trying to solve DCG grammar in prolog and succeeded upto a point, i'm stuck in evaluating the expressions involving braces like these. expr( T, [’(’, 5, +, 4, ’)’, *, 7], []),

expr(Z) --> num(Z).
expr(Z) --> num(X), [+], expr(Y), {Z is X+Y}.
expr(Z) --> num(X), [-], expr(Y), {Z is X-Y}.
expr(Z) --> num(X), [*], expr(Y), {Z is X*Y}.
num(D) --> [D], {number(D)}.

eval(L, V, []) :- expr(V, L, []).

Solution

The parsers implemented by Prolog's DCG grammars are recursive-descent LL(something) (predictive) grammars. It walks the input from left to right and produces a leftmost derivation as it goes. They're easy to craft but the grammar's must conform to a few restrictions:

They cannot be left-recursive. Infinite recursion can/will result. This means that at least one symbol (token) has to be removed from the input stream prior to following a recursive path. Refactoring grammars to remove left-recursion is a fairly mechanical exercise, albeit tedious. See any decent compiler book on how to do that.

Operator precedence is typically built into the structure of the grammar itself. Here's BNF notation showing one way of defining a recursive descent grammar for the parsing/evaluation of simple arithmetic expressions:

ArithmeticExpression     : AdditiveExpression
                         ;

AdditiveExpression       : MultiplicativeExpression
                         | MultiplicativeExpression '+' AdditiveExpression
                         | MultiplicativeExpression '-' AdditiveExpression
                         ;

MultiplicativeExpression : ExponentialExpression
                         | ExponentialExpression '*' MultiplicativeExpression
                         | ExponentialExpression '/' MultiplicativeExpression
                         | ExponentialExpression '%' MultiplicativeExpression
                         ;

ExponentialExpression    : UnaryExpression
                         | UnaryExpression '^' ExponentialExpression
                         ;

UnaryExpression          : '-' UnaryExpression
                         | AtomicExpression
                         ;

AtomicExpression         : '(' ArithmeticExpression ')'
                         | ['0'..'9']+
                         ;

The term at each level of operator precedence is built from expressions of the next higher order of precedence. So an arbitrary arithmetic expression is just a single additive expression.

Each additive expression is 1 or more multiplicative expressions, joined by addition and subtraction operators.

Each multiplicative expression is 1 or more exponential expressions, joined by multiplication, division and remainder operators.

Each exponential expression is a unary expression with an option exponent operator followed by another unary expression.

Each unary expression is either an atomic expression, or a unary minus followed by another unary expression.

Each atomic expression is either an arbitrary arithmetic expression, enclosed in parentheses, or an unsigned integer token.

Translation of the above into Prolog's DCG syntax should be trivial. How to evaluate the term represented by each clause in the grammar should be self-evident.

OTHER TIPS

This is one of the strangest things I observed in the history of Prolog. Namely that a wrong expression syntax is shown around already for ages. The wrong syntax is already found in the DEC10 Prolog documentation and the misfit is seen when we look at a rule:

 expr(Z) --> num(X), "/", expr(Y), {Z is X/Y}.
 etc..

This makes the division operator xfy, but it should be yfx. Thus with the above rule an expression 10/2/5 is read as 10/(2/5) and leads to 25 as the result. But in fact the example should be read as (10/2)/5 yielding 1 as the result.

The problem is that the correct syntax would be left recursive. And DCG do have problems with left recursive rules. The Prolog interpreter would just run into an infinite loop for a left recursive rules by repeatedly call expr/3:

 expr(Z) --> expr(X), "/", num(Y), {Z is X/Y}
 etc..

So the solution is to eliminate the left recursion by introducing an accumulator and additional rules. I don't know whether this method works in general, but it works for sure in the present case. So the correct and depth first executable rule would read:

 expr(Y) --> num(X), expr_rest(X,Y).

 expr_rest(X,T) --> "/", !, num(Y), {Z is X/Y}, expr_rest(Z,T).
 etc..
 expr_rest(X,X).

The above grammar is a little bit a more challenging DCG. It is not anymore pure Prolog, since it uses the cut (!). But we could eliminate the cut, for example by a push-back, something along the following lines. The push-back is again a complicated matter to explain in a DCG introduction, and we would need to introduce an stop character at the end of an expression to make it work:

 etc..
 expr_rest(X,X), [C] --> [C], {not_operator(C)}.

Or we could neither go into the lengths of the cut or the push-back and live with it that on backtracking the parser will do additional, in the present case unnecessary, work. So bottom line is probably, although the example is not correct, it is simple enough to explain DCG without needing to much advanced stuff of DCG.

Interestinglyg the missing parenthesis syntax is hardly affected by the elimination of left recursion. Simply add:

 num(X) --> "(", !, expr(X), ")".

Oops, a cut again!

Best Regards

Full code can be seen here: http://www.jekejeke.ch/idatab/doclet/prod/en/docs/05_run/06_bench/09_programs/10_calculator/01_calculator.p.html

P.S.: Instead of eliminating the left recursion, we could also have worked with some form of tabling.

It just works. However it is not easier than yacc/bison.

%?-eval('11*(7+5-2)^2*(11+8)').
eval(A) :- lex(A,L), evallist(L).

%?-evallist([11,*,'(',7,+,5,-,2,')',^,2,*,'(',11,+,8,')']).
evallist(L) :- e(R,L,[]),write(R),!.

e(N) --> t(N1), erest(N1,N).

erest(N1,N) --> [+], !, t(N2), {N3 is N1+N2}, erest(N3,N);
                [-], !, t(N2), {N3 is N1-N2}, erest(N3,N).
erest(N,N) --> [].

t(N) --> f(N1), trest(N1,N).

trest(N1,N) --> [*], !, f(N2), {N3 is N1*N2}, trest(N3,N);
                [/], !, f(N2), {N3 is N1/N2}, trest(N3,N).
trest(N,N) --> [].

f(N) --> n(N);
     n(N1), [^], f(N2), {N is N1**N2}.

n(N) --> ['('], !, e(N), [')'];
     [-], !, e(N1), {N is -N1};
     num(N). 

num(N) --> [N], {number(N)}.

lex(A,NL) :- 
  atom_chars(A,L), lex0(_,L,NL).

lex0(S,L,NL) :-
  L=[], (number(S), NL=[S], !; NL=[]), !;
  L=[E|R], (d(E,N), (number(S), !; S=0), S1 is S*10+N, lex0(S1, R, NL), !;
     lex0(_,R,NR), (number(S), NL=[S|[E|NR]], !;
        NL=[E|NR])).

d(I,N) :- 
  char_code(I,C), C > 47, C < 58, N is C - 48.

Adding this clause appears to work: num(D) --> ['('], expr(D), [')'].

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow