Question

So, I'm trying to minimize boolean expressions using RegEx in C.

I would need a regular expresion to match XXX AND XXX, where "XXX" is the same thing on both ends, for example, I need to match:

((p OR q) AND (p OR q))

but not match

(p AND (p OR q))

because the right side of the AND is not exacly the one on the left, so I can replace it with only

(p OR q)

I think I got the replacing bit, but I need the regular expresion to match things like the one stated.

Thanks :)

Was it helpful?

Solution

You can use something like:

(.+?)(?: AND \1)+

then you need only to replace with the content of the first capturing group.

Note that you can add (?s) at the begining of the pattern to allow the dot to match newlines.

Here is a more waterproof pattern:

(?s)(.+?)(?:\s+AND\s+\1)+

To remove possible parenthesis around an expression, you can use:

(?s)(\(\s*)(.+?)(?:\s+AND\s+\2)+(?(1)\s*\))

Note that you must apply this replacement several times (until there is no more matches) to obtain the result you want, since you can have overlapping matches. Example:

(((p OR q) AND (p OR q)) AND ((p OR q) AND (p OR q)))

OTHER TIPS

Regular expressions are used to describe strings that belong to the class of language named Regular Language .

In particular, strings of the form WxW aren't regular. You can prove this using the Pumping Lemma for Regular Languages . That is, same pattern strings separated by x . In fact, the pattern you want to describe is one of the key examples of how to prove a language non-regular.

The pattern you describe, can be however generated by a Context Free Grammar (equivalently, a push down automata). A parser, specifically.

Since WxW doesn't belong to the class of Regular Languages, there can't be a regular expression describing it.

As Casimir mentioned, the RegEx I was looking for was:

(.+) AND \\1

I got these results using that patern:

[INPUT] -> [OUTPUT]
r AND (q AND q) -> r AND q
r AND ((q) AND (q)) -> r AND (q)
r AND ((q OR w) AND (q or w)) -> r AND (q OR w)
r AND ((q OR w) AND (q or t)) -> r AND ((q OR w) AND (q or t))

Thank you :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top