Question

I understand that Prolog programmers typically use DCGs instead of regular expressions for matching patterns in a string. In Perl, one might write

if ( '... accd' =~ /a+b*c{2,4}d$/ ) {
    say "matched";
}

How would one match the same pattern in Prolog?

Was it helpful?

Solution

I've extended this answer

:- op(100, xf, *).
:- op(100, xf, +).

rexp(C) --> [C].

rexp([T|Ts])   --> rexp(T), rexp(Ts).
rexp([])       --> [].

rexp(eps)      --> [].

rexp(_*)       --> [].
rexp(R*)       --> rexp(R), rexp(R*).

rexp(R+)       --> rexp(R), rexp(R*).

rexp((R1|R2))  --> ( rexp(R1) ; rexp(R2) ).

rexp(range(R,N,M)) -->
    {between(N,M,L),
     length(D,L),
     maplist(copy_term(R),D)
    }, rexp(D).

then your regular expression match could be

?-  phrase(rexp([a+, b*, range(c,2,4), d]), [a,c,c,d]),
    writeln(matched).

Note that in this way we match atoms instead of single characters.

edit after false' comment, I think that the first clause should read

rexp(C) --> {atomic(C)}, [C].

to avoid for instance

?- phrase(rexp([a+]), [a+]).
true ;

Indeed, after the correction we have the expected outcome:

?- phrase(rexp([a+]), [a+]).
false.

done edit

Instead of interpreting regular expressions the pattern could be 'hardcoded' (much easier)

% I prefer the equivalent clause below
% p1 --> "a", p1 ; "a", p2.
p1 --> "a", (p1 ; p2).
p2 --> "b", p2 ; p3.
p3 --> ("cc" ; "ccc" ; "cccc"), "d".

then

?- phrase(p1, "accd").
true

here we match single characters (a string in Prolog it's a list of character codes)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top