Question

Are there such things?

Like for example, S -> aSb | ^ (possible words: ^, ab, aabb, aaabbb, aaaabbbb, ...)

From what I've learned, the only regex that closely match the said grammar is: a*b*

But the regex can produce words such as aab, abb, ... where a's and b's aren't equal.

Is there a solution to this? Something like: a*b* if #a = #b

EDIT: I think there is no solution to this.

What is the correct explanation for this? This is actually a snippet of my homework, and I really don't know what to answer since there are no solutions in translating the grammar to regex.

Was it helpful?

Solution

If you are talking about formal language theory then of course all non-regular grammars (as in your example) can't be expressed with a regular expression (per definition).

But if you are wondering of what different regex flavors (in programming languages/regex libs) can do, then you can match all kinds of non-regular grammars/languages.

For example in Perl/PCRE you can match your example language with any of these:

  • Using recursion/sub pattern calls:

    ^(a(?1)b)$

  • Using a backreference (with a conditional):

    ^(?:a(?=a*(b(?(1)\1))))+\1$|^$

You may be interested in this questions and answers: Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

OTHER TIPS

In formal language theory something called a "pumping lemma" can be used to prove that certain sets of sentences (languages) can not be described by a regular expression. See wikipedia http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages. You start from the language you want to describe and use the pumping lemma to find a contradiction. The proof for your example is actually on that wikipedia page.

A similar theory exists for context-free languages. Some languages just can not be described by context-free grammars.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top