A sufficient and necessary condition about regularity of a language

https://cs.stackexchange.com/questions/156

16-10-2019
|

Question

Which of the following statements is correct?

sufficient and necessary conditions about regularity of a language exist but not discovered yet.

There's no sufficient and necessary condition about regularity of a language.

Pumping lemma is a necessary condition for non-regularity of a language.

Pumping lemma is a sufficient condition for non-regularity of a language.

I know #(4) is correct and #(3) is false because "the converse of this statement is not true: a language that satisfies these conditions may still be non-regular", but what can be said about (1) and (2)?

Solution

Here are some necessary and sufficient conditions for a language to be regular.

Theorem. Let $L\subseteq\Sigma^*$. The following conditions are equivalent:

$L$ is generated by a regular expression (i.e., the definition of regular language).
$L$ is recognized by a nondeterministic finite automaton (Kleene).
$L$ is recognized by a nondeterministic finite automaton without $\varepsilon$-transitions.
$L$ is recognized by a deterministic finite automaton (Scott and Rabin).
$L$ is generated by a grammar $(N,\Sigma,P,S)$, where $N$ is a finite subset of $\Sigma^*$ (Frazier and Page).
$L$ is generated by a left (resp. right) regular context-free grammar.
The index of the Nerode relation $\equiv_L$ is finite (Anil Nerode, Linear automaton transformations, 1958). This is widely (and incorrectly) known as the Myhill-Nerode theorem. $\equiv_L$ is the relation used to build the minimal DFA of a regular language.
The index of the Myhill relation $\sim_L$ is finite (John Myhill, Finite Automata and the Representation of Events, 1957). $\sim_L$ is the relation used to build the syntactic monoid of an arbitrary language.
The syntactic monoid of $L$ is finite (consequence of Myhill's result). We note here that the syntactic monoid, apart from being defined by using relation $\sim_L$, can be defined as a minimal monoid (in size) that recognizes $L$ as a preimage of a homomorphism.
$L$ can be recognized by a read-only Turing Machine (trivial).
$L$ can be defined by a formula in Monadic second-order logic over strings (Büchi).

If a language does not satisfy the conditions of the pumping lemma for regular languages, then it is not regular. This means pumping lemma is a sufficient condition for the non-regularity of a language.

In summary, statements 1, 2 and 3 are false, while statement 4 is true, as you mentioned.

OTHER TIPS

It is sufficient (and necessary) to show the existence of a DFA, NFA or regular expression to prove that a language is regular, which refutes (1) and (2). To show that a language is not regular it is needed to show that a DFA, NFA or regular expression does not exists.

The pumping lemma is a useful tool to show (possibly by contradiction) that a language is not regular, by showing that no DFA exists.

The condition 'there exists a regular expression that generates exactly $L$' is a necessary and sufficient condition for the regularity of a language $L$, by grace of being its definition.

This condition doesn't exactly make it easy to prove non-regularity of a language though. I'm not aware of any conditions that are easy to check that always prove non-regularity of a non-regular language.

There are two more 'tests' that can prove the non-regularity of a language (though they may not work): you can try to give some regular language such that their union/intersection/difference/concatenation/quotient is non-regular (there are more operations like this), and you can try to count how many words it generates and check if it contradicts the expression for the number of words in a regular language (as can be found on the Wikipedia page you linked).

There is this wonderful connection between formal language theory and formal power series proofed by Chomsky and Schützenberger [CS63]. In the form found in [SS78] Chap. II, Theorem 5.1

Let $L$ be a regular language and $\mathbb{K}$ a semiring. Then $\mathrm{char}(L)$ is $\mathbb{K}$-rational.

So if you look at the characteristic series $\mathrm{char}(L)$ of a language and it turns out not to be rational, it can’t be a regular language. So you don’t have to use the pumping lemma all the time. With the help of computer algebra systems such a check is not too hard to do.

[SS78] Arto Salomaa and Matti Soittola. Automata-Theoretic Aspects of Formal Power Series. Springer-Verlag, New York, 1978.

[CS63] Noam Chomsky and Marcel P. Schützenberger. The algebraic theory of context-free languages. In P. Braffort and D. Hirschberg, editors, Computer Programming and Formal Languages, pages 118–161. North Holland, 1963.

By the Myhill-Nerode theorem, a language is regular iff there are finitely many equivalence classes of the indistinguishability relation $I_{L}$. Two strings $x$ and $y$ are said to be indistinguishable, written $x I_{L} y$, iff:

For all strings $z_{x}$ such that $xz_{x} \in L$, also $yz_{x} \in L$.
For all strings $z_{y}$ such that $yz_{y} \in L$, also $xz_{y} \in L$.

Intuitively, this corresponds to the idea that there are finitely many states in the state machine. In fact, the number of distinct equivalence classes under the $I_{L}$ relation is precisely the number of states in a minimal DFA for $L$.

Myhill-Nerode is especially useful because you can construct positive proofs (the language is regular) and negative proofs (the language isn't regular) using the same framework: the relation $I_{L}$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange