Regular language as finite union of periodic sets

https://cs.stackexchange.com/questions/125670

29-09-2020
|

Question

Is it true that every regular language can be expressed as a finite union of periodic sets? In other words, if $L$ is regular, then do there exist finite sets $A_1,\dots,A_n,B_1,\dots,B_n$ such that

$$L = A_1 \cdot B_1^* \cup \cdots \cup A_n \cdot B_n^*.$$

I know this is true for regular languages over a unary alphabet, but I'm not sure about general alphabets.

Solution

Every language of this form can be represented as a regular expression without nested Kleene star. That is, its star height is $1$. The star height hierarchy is strict, and in particular, it is known that the language $(a^*b^*c)^*$ cannot be represented in that form. An example over a binary alphabet would be $(aa(ab)^*bb(ab)^*)^*$.

OTHER TIPS

The answer by @YuvalFilmus is perfectly fine, and points you to the import notion of star height. But let me add a little bit more. We will show that languages of your form give a proper subset of the languages of star height one. But first, some musings what might come close to your form.

General Regular Languages

First, probably the closest form to yours for regular $L \subseteq \Sigma^*$, accepted by a complete deterministic automaton $A = (\Sigma, Q, \delta, q_0, F)$. For $q \in Q$ and $E \subseteq Q$, write $L_{q, E}(A)$ for the language accepted by $(\Sigma, Q, \delta, q, E)$, i.e., changing the start state to $q$ and the set of final states to $E$. Then, we can write $$ L = \bigcup_{ q \in F } L_{q_0, q}(A) L_{q, \{q\}}(A)^* $$ This result seems to be folklore and follows easily. It could be a little bit strengthened, by replacing $L_{q_0, \{q\}}(A)$ by the language $$ \{ u \in L_{q_0, \{q\}} \mid \delta(q_0, v) \ne q \mbox{ for each proper prefix $v$ of $u$} \} $$ in the above equation. It has the property that no word in it is a proper prefix of another word. A variant of this decomposition, and more refinements, appear in the book Automata, Languages and Machines, Volume A by S. Eilenberg, under the name iterated up-decomposition.

Commutative and Bounded Regular Languages

Others forms, more close to yours, and being proper generalizations of the unary language case you mentioned, could be given for the bounded and commutative regular languages. A language is commutative, if it is closed under permutation of letters. For example, $\{ab,ba\}$ is commutative, whereas $\{ab\}$ is not. A language $L \subseteq \Sigma^*$ is bounded, if $L \subseteq w_1^* \cdots w_n^*$ for words $w_i \in \Sigma^*$. In what follows, let us denote by $\diamond$ the shuffle of two languages, and, for $L \subseteq \Sigma^*$, by $L^{\diamond,*} = \bigcup_{n \ge 0} \underbrace{L \diamond \ldots \diamond L}_{\mbox{$n$ times}}$ the iterated shuffle. Also, by $\operatorname{perm} : 2^{\Sigma^*} \to 2^{\Sigma*}$ denote the permutational closure, or commutative closure, i.e. adding all words that are permutations. For example, $\operatorname{perm}(\{ab\}) = \{ab,ba\}$.

Note that $L \subseteq w^*$ is regular iff $L = w^n(w^p)^*$ for some $n, p \ge 0$. This is implied by noting that $\{ n : w^n \in L \}$ is ultimately periodic (it could be viewed as a regular unary language).

By a result of Ginsburg/Spanier we have

A language $L \subseteq w_1^* \cdots w_r^*$ is regular iff it is a finite union of languages of the form $L_1 \cdots L_r$, where each $L_i \subseteq w_i^*$ is regular.

Also, if $\Sigma = \{a_1, \ldots, a_k\}$ and $L \subseteq \Sigma^*$ is commutative and regular, it is possible to show that $$ L = \bigcup_{i=1}^n \operatorname{perm}(u_i) \diamond \operatorname{perm}(N_i)^{\diamond,*} $$ for finite sets $N_i \subseteq a_1^* \cup \ldots \cup a_k^*$ with $|N_i \cap a_j^*| \le 1$.

Also, as stated here and here, a regular language $L$ over $\Sigma = \{a_1, \ldots, a_k\}$ is commutative iff $$ L = \bigcup_{i=1}^n U_{i,1} \diamond \ldots \diamond U_{i,k} $$ for unary regular languages $U_{i,j} \subseteq a_j^*$ with $i \in \{1,\ldots, n\}$ and $j \in \{1,\ldots,k\}$.

All these results are closely related. Note that for unary languages, they all reduce to the form you have stated.

A Necessary Condition for Languages of your Form

The condition I will state involves topology and infinite words. It yields, for example, that $b^*a$ and $(b^*a)^*$ could not be written in your form. The latter has star height two, note that star height could be characterized by cycle rank, see Eggan's Theorem. However, I do want to give a self-contained, and different, argument here, not based on star height. Let $\Sigma^{\omega}$ be the set of infinite words over $\Sigma$. Define an operator $W : 2^{\Sigma^*} \to 2^{\Sigma^{\omega}}$ by, for $L \subseteq \Sigma^*$, $$ W(L) = \{ \xi \in \Sigma^{\omega} \mid \mbox{ $\xi$ has infinitely many prefixes in $L$ } \}. $$ Then \begin{align*} W(b^*a) & = \emptyset \\ W((b^*a)^*) & = \{ \xi \in \Sigma^{\omega} \mid \mbox{$\xi$ has infinitely many $a$'s. } \}. \end{align*} Observe that $$ W(U \cup V) = W(U) \cup V(V), $$ as, if $\xi \in W(U\cup V)$, then, by the pigenhole principle, at least one of $U$ or $V$, or both, must contain infinitely many prefixes of $\xi$. Also, in your form we could assume all the sets $A_i$ are singletons, as concatenation distributes over union.

Let us inspect languages of the form $L = u(u_1 + \ldots + u_n)^*$, which are the parts of your form. Set $\Gamma = \{ b_1, \ldots, b_n \}$, an auxiliary alphabet, and consider the homomorphism $h : \Gamma \to \Sigma^*$ given by $h(b_i) = u_i$, i.e., substituting each letter by the corresponding word. This homomorphism could also be applied to infinite words in $\Gamma^{\omega}$. We have $$ W(u(u_1 + \ldots + u_n)^*) = \{ uh(\xi) \mid \xi \in \Gamma^{\omega} \}. $$ In particular, if all the set $B_i$ are singletons, $W(L)$ is finite, and $W(L) \ne \emptyset$ for every infinite language of your form.

By the last observation, $b^*a$ could not be written in your form. So, this is a very simple example, even of star height one. But let us show that $L = (b^*a)^*$ could also not be written that way. Assume it could be, then the set of infinite words, which all have an infinite number of $a$'s in it, could be written in terms of a homomorphism $h : \Gamma^* \to \Sigma^*$ as outlined above, i.e., $$ \{ \xi \in \Sigma^{\omega} \mid \mbox{$\xi$ has infinitely many $a$'s. } \} = u h(\Gamma^{\omega}) $$ for some $u \in \Sigma^*$. Let $m = \{ |h(x)| : x \in \Gamma \} + |u|$. Then $ub^maaaaa\cdots$ is in this language, and this implies $h(x) \in b^+$ for some $x \in \Gamma$. But then $ubbbbbbb\cdots \in uh(\Gamma^{\omega})$, which is a contradiction, as it only contains a finite number of $a$'s. Hence, $(b^*a)^*$ could not be written that way. $\square$

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange