Removing whitespace from strings in Prolog

https://stackoverflow.com/questions/14366036

16-01-2022
|

سؤال

I wrote parser in Prolog. I haven't finished yet. It is a part of code. The next step is killing all whitespace in string.

parse(Source, Tree) :-  kill_whitespace(Source, CleanInput), % remove whitespaces
                        actual_parse(CleanInput, Tree).

actual_parse(CleanInput, Tree):- phrase(expr(Tree),CleanInput).

expr(Ast) --> term(Ast1), expr_(Ast1,Ast).
expr_(Acc,Ast) --> " + ", !, term(Ast2), expr_(plus(Acc,Ast2), Ast).
expr_(Acc,Ast) --> " - ", !, term(Ast2), expr_(minus(Acc,Ast2), Ast).
expr_(Acc,Acc) --> [].

term(Ast) --> factor(Ast1), term_(Ast1,Ast).
term_(Acc,Ast) --> " * ", !, factor(Ast2), term_(mul(Acc,Ast2),Ast).
term_(Acc,Ast) --> " ** ", !, factor(Ast2), term_(pol(Acc,Ast2),Ast).
term_(Acc,Acc) --> [].

factor(Ast) --> "(", !, expr(Ast), ")".
factor(D)--> [X], { X >= 48 , X=<57 , D is X-48 }.
factor(id(N,E)) --> "x", factor(N), ":=", expr(E), ";".

For example:

?- parse("x2:=4",T).
    T = id(2, 4)

True! But, when I write:

?- parse("x2 := 4",T).
false.

It must be true as well and it should be a filter: kill_whitespace(Source, CleanInput).

Different solutions are inefficient. How can I do that?

المحلول

I usually place a 'skip' non terminal where space can occurs. Such skip usually discards comments as well as any other 'uninteresting' text.

To keep as simpler as possible:

% discard any number of spaces
s --> "" ; " ", s.

I prefer a short name, to keep the grammar clean. To discard newlines etc.. as well:

s --> "" ; (" ";"\t";"\n";"\r"), s.

A 'style' note: instead of

parse(Source, Tree) :-
   expr(Tree, Source, []).

you could consider

parse(Source, Tree) :-
   phrase(expr(Tree), Source).

نصائح أخرى

well, the easy way is to parse the string and remove whitespace/keep only non-whispace with a filter predicate. But this requires a second parse.

An other way to fix it is to use your own predicate to "get" characters,
i.e. foo --> "a". becomes foo --> get("a"). where get//1 is something like:

get(X) --> [X].
get(X) --> whitespace, get(X).

The usual way of writing a parser is to write it in two stages:

The first stage conducts lexical analysis and produces a stream of tokens. Whitespace and other "tokens" not significant to the parse (e.g., comments) are discarded at this point.

The second stage conducts the parse itself, examining the list of tokens produced by the lexical analyzer.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow