How to prove a language is regular?

https://cs.stackexchange.com/questions/1331

16-10-2019
|

Pergunta

There are many methods to prove that a language is not regular, but what do I need to do to prove that some language is regular?

For instance, if I am given that $L$ is regular, how can I prove that the following $L'$ is regular, too?

$\qquad \displaystyle L' := \{w \in L: uv = w \text{ for } u \in \Sigma^* \setminus L \text{ and } v \in \Sigma^+ \}$

Can I draw a nondeterministic finite automaton to prove this?

Solução

Yes, if you can come up with any of the following:

deterministic finite automaton (DFA),
nondeterministic finite automaton (NFA),
regular expression (regexp of formal languages) or
regular grammar

for some language $L$, then $L$ is regular. There are more equivalent models, but the above are the most common.

There are also useful properties outside of the "computational" world. $L$ is also regular if

it is finite,
you can construct it by performing certain operations on regular languages, and those operations are closed for regular languages, such as
- intersection,
- complement,
- homomorphism,
- reversal,
- left- or right-quotient,
- regular transduction
and more, or
using Myhill–Nerode theorem if the number of equivalence classes for $L$ is finite.

In the given example, we have some (regular) langage $L$ as basis and want to say something about a language $L'$ derived from it. Following the first approach -- construct a suitable model for $L'$ -- we can assume whichever equivalent model for $L$ we so desire; it will remain abstract, of course, since $L$ is unknown. In the second approach, we can use $L$ directly and apply closure properties to it in order to arrive at a description for $L'$.

Outras dicas

Elementary methods

Finite automata (possibly nondeterministic, with empty transitions).
Regular expressions.
Right (or Left, but not both) linear equations, like $X = KX + L$ where $K$ and $L$ are regular.
Regular (Type 3) grammar.
Operations preserving regular languages (Boolean operations, product, star, shuffle, morphisms, inverses of morphisms, reversal, etc.)
Recognized by a finite monoid.

Logical methods (often used in formal verification)

Monadic second order logic (Büchi's theorem).
Linear temporal logic (Kamp's theorem).
Rabin's tree theorem (Monadic second order logic with two successors). Very powerful.

Advanced methods

Sophisticated pumping lemmas. See for instance
[1] J. Jaffe, A necessary and sufficient pumping lemma for regular languages, Sigact News - SIGACT 10 (1978) 48-49.
[2] A. Ehrenfeucht, R. Parikh, and G. Rozenberg, Pumping lemmas for regular sets, SIAM J. Comput. 10 (1981), 536-541.
[3] S. Varricchio, A pumping condition for regular sets, SIAM J. Comput. 26 (1997) 764-771.
Well quasi orders. See
[4] W. Bucher, A. Ehrenfeucht, D. Haussler, On total regulators generated by derivation relations, Theor. Comput. Sci. 40 (1985) 131–148.
[5] M. Kunz, Regular Solutions of Language Inequalities and Well Quasi-orders.
Support of $\mathbb{N}$-rational series.
Algebraic methods based on Transductions (see also Operations preserving regular languages).

The answer by Ran G. give a fairly extensive listing of the equivalent models that can be used to specify regular languages (and the list goes on, two-way automata, MSO logic, but that is covered by the link under 'more equivalent models'). And as Raphael stresses, we need an argument to convince the audience that the chosen representation is indeed correct.

Reconsidering the question, it adds 'For instance $\dots$'. That means we have to give a valid construction that, given any of the above models we assume specify language $L$, turns that model into one for $L'$. This generally will be the same type of model, but need not be: we can e.g. start with an deterministic FSA for $L$ and end with a nondeterminitic one for $L'$.

This includes the possibility to use closure operations: in the explicitly given operation in the example we have $L'= (\Sigma^* \setminus L) \cdot \Sigma^*$.

So, my point is that the answer is great, but we should add the "from $L$ to $L'$ construction", when not building a specific language from scratch.

Occasionally you will encounter a language specified as "all strings $s$ where every $k$-element substring of $s$ satisfies <some property>", where $k$ is some fixed constant. In that case, the language will be regular. The idea here is to define a finite automaton some of whose states "remember" the most recently-seen $k$ input symbols, as in the answer to this question.

Another method, not covered by the answers above, is finite automaton transformation. As a simple example, let us show that the regular languages are closed under the shuffle operation, defined as follows: $$ L_1 \mathop{S} L_2 = \{ x_1y_1 \ldots x_n y_n \in \Sigma^* : x_1 \ldots x_n \in L_1, y_1 \ldots y_n \in L_2 \} $$ You can show closure under shuffle using closure properties, but you can also show it directly using DFAs. Suppose that $A_i = \langle \Sigma, Q_i, F_i, \delta_i, q_{0i} \rangle$ is a DFA that accepts $L_i$ (for $i=1,2$). We construct a new DFA $\langle \Sigma, Q, F, \delta, q_0 \rangle$ as follows:

The set of states is $Q_1 \times Q_2 \times \{1,2\}$, where the third component remembers whether the next symbol is an $x_i$ (when 1) or a $y_i$ (when 2).
The initial state is $q_0 = \langle q_{01}, q_{02}, 1 \rangle$.
The accepting states are $F = F_1 \times F_2 \times \{1\}$.
The transition function is defined by $\delta(\langle q_1, q_2, 1 \rangle, \sigma) = \langle \delta_1(q_1,\sigma), q_2, 2 \rangle$ and $\delta(\langle q_1, q_2, 2 \rangle, \sigma) = \langle q_1, \delta_2(q_2,\sigma), 1 \rangle$.

A more sophisticated version of this method involves guessing. As an example, let us show that regular languages are closed under reversal, that is, $$ L^R = \{ w^R : w \in \Sigma^* \}. $$ (Here $(w_1\ldots w_n)^R = w_n \ldots w_1$.) This is one of the standard closure operations, and closure under reversal easily follows from manipulation of regular expressions (which may be regarded as the counterpart of finite automaton transformation to regular expressions) – just reverse the regular expression. But you can also prove closure using NFAs. Suppose that $L$ is accepted by a DFA $\langle \Sigma, Q, F, \delta, q_0 \rangle$. We construct an NFA $\langle \Sigma, Q', F', \delta', q'_0 \rangle$, where

The set of states is $Q' = Q \cup \{q'_0\}$.
The initial state is $q'_0$.
The unique accepting state is $q_0$.
The transition function is defined as follows: $\delta'(q'_0,\epsilon) = F$, and for any state $q \in Q$ and $\sigma \in \Sigma$, $\delta(q', \sigma) = \{ q : \delta(q,\sigma) = q' \}$.

(We can get rid of $q'_0$ if we allow multiple initial states.) The guessing component here is the final state of the word after reversal.

Guessing often involves also verifying. One simple example is closure under rotation: $$ R(L) = \{ yx \in \Sigma^* : xy \in L \}. $$ Suppose that $L$ is accepted by the DFA $\langle \Sigma, Q, F, \delta, q_0 \rangle$. We construct an NFA $\langle \Sigma, Q', F', \delta', q'_0 \rangle$, which operates as follows. The NFA first guesses $q=\delta(q_0,x)$. It then verifies that $\delta(q,y) \in F$ and that $\delta(q_0,x) = q$, moving from $y$ to $x$ non-deterministically. This can be formalized as follows:

The states are $Q' = \{q'_0\} \cup Q \times Q \times \{1,2\}$. Apart from the initial state $q'_0$, the states are $\langle q,q_{curr}, s \rangle$, where $q$ is the state that we guessed, $q_{curr}$ is the current state, and $s$ specifies whether we are at the $y$ part of the input (when 1) or at the $x$ part of the input (when 2).
The final states are $F' = \{\langle q,q,2 \rangle : q \in Q\}$: we accept when $\delta(q_0,x)=q$.
The transitions $\delta'(q'_0,\epsilon) = \{\langle q,q,1 \rangle : q \in Q\}$ implement guessing $q$.
The transitions $\delta'(\langle q,q_{curr},s \rangle, \sigma) = \langle q,\delta(q_{curr},\sigma),s \rangle$ (for every $q,q_{curr} \in Q$ and $s \in \{1,2\}$) simulate the original DFA.
The transitions $\delta'(\langle q,q_f,1 \rangle, \epsilon) = \langle q,q_0,2 \rangle$, for every $q \in Q$ and $q_f \in F$, implement moving from the $y$ part to the $x$ part. This is only allowed if we've reached a final state on the $y$ part.

Another variant of the technique incorporates bounded counters. As an example, let us consider change edit distance closure: $$ E_k(L) = \{ x \in \Sigma^* : \text{ there exists $y \in L$ whose edit distance from $x$ is at most $k$} \}. $$ Given a DFA $\langle \Sigma, Q, F, \delta, q_0 \rangle$ for $L$, e construct an NFA $\langle \Sigma, Q', F', \delta', q'_0 \rangle$ for $E_k(L)$ as follows:

The set of states is $Q' = Q \times \{0,\ldots,k\}$, where the second item counts the number of changes done so far.
The initial state is $q'_0 = \langle q_0,0 \rangle$.
The accepting states are $F' = F \times \{0,\ldots,k\}$.
For every $q,\sigma,i$ we have transitions $\langle \delta(q,\sigma), i \rangle \in \delta'(\langle q,i \rangle, \sigma)$.
Insertions are handled by transitions $\langle q,i+1 \rangle \in \delta'(\langle q,i \rangle, \sigma)$ for all $q,\sigma,i$ such that $i < k$.
Deletions are handled by transitions $\langle \delta(q,\sigma), i+1 \rangle \in \delta'(\langle q,i \rangle, \epsilon)$ for all $q,\sigma,i$ such that $i < k$.
Substitutions are similarly handles by transitions $\langle \delta(q,\sigma), i+1 \rangle \in \delta'(\langle q,i \rangle, \tau)$ for all $q,\sigma,\tau,i$ such that $i < k$.

A language is regular iff you can write a scanner that decides on arbitrary strings whether or not they belong to the language using no more than a fixed amount of memory - i.e. recognition can be done in O(1) space.

Regular expression transformation is one way to prove closure under certain operations. The two simplest examples are closure under reversal and closure under homomorphism.

Given a regular expression $r$ representing a language $L$, we can construct a regular expression for $L^R$, the language of the reverse of all words in $L$, recursively:

$\epsilon^R = \epsilon$, $\sigma^R = \sigma$, $\emptyset^R = \emptyset$.
$(r_1+r_2)^R = (r_1^R + r_2^R)$, $(r^*)^R = (r^R)^*$, $(r_1r_2)^R = r_2^R r_1^R$.

All the action happens at the final rule $(r_1r_2)^R = r_2^R r_1^R$. As an example, you can check that $(0^*1^*)^R = 1^*0^*$. Structural induction establishes that the language of $r^R$ is indeed $L^R$.

Another simple example is closure under homomorphism. Given a homomorphism $h\colon \Sigma \to \Delta^*$ and a regular expression $r$ for a language $L$, we can obtain a regular expression for $h(L)$ by replacing every instance of $\sigma$ in $r$ by $h(\sigma)$.

Another way is to build up the language using operations under which you know they are closed. This is an extension to exhibiting a regular expression, as you have many more operations available (reverse the string, complement, intersection, chop out a piece, just keep a part, homomorphisms and inverse homomorphisms, ...)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange