What are the conditions for a NFA for its equivalent DFA to be maximal in size?

https://cs.stackexchange.com/questions/130

16-10-2019
|

Question

We know that DFAs are equivalent to NFAs in expressiveness power; there is also a known algorithm for converting NFAs to DFAs (unfortunately I do now know the inventor of that algorithm), which in worst case gives us $2^S$ states, if our NFA had $S$ states.

My question is: what is determining the worst case scenario?

Here's a transcription of an algorithm in case of ambiguity:

Let $A = (Q,\Sigma,\delta,q_0,F)$ be a NFA. We construct a DFA $A' = (Q',\Sigma,\delta',q'_0,F')$ where

$Q' = \mathcal{P}(Q)$,
$F' = \{S \in Q' | F \cap S \neq \emptyset \}$,
$\delta'(S,a) =\bigcup_{s \in S} (\delta(s,a) \cup \hat \delta(s,\varepsilon))$, and
$q'_0 = \{q_0\} \cup \hat \delta(q_0, \varepsilon)$,

where $\hat\delta$ is the extended transition function of $A$.

Solution

The algorithm you refer to is called the Powerset Construction, and was first published by Michael Rabin and Dana Scott in 1959.

To answer your question as stated in the title, there is no maximal DFA for a regular language, since you can always take a DFA and add as many states as you want with transitions between them, but with no transitions between one of the original states and one of the new ones. Thus, the new states will not be reachable from the initial state $q_0$, so the language accepted by the automaton will not change (since $\hat\delta(q_0,w)$ will remain the same for all $w\in\Sigma^*$).

That said, it is clear that there can be no conditions on a NFA for its equivalent DFA to be maximal, since there is no unique equivalent DFA. In contrast, the minimal DFA is unique up to isomorphism.

A canonical example of a language accepted by a NFA with $n+1$ states with equivalent DFA of $2^n$ states is $$L=\{w\in\{0,1\}^*:|w|\geq n\text{ and the $n$-th symbol from the last one is 1}\}.$$ A NFA for $L$ is $A=\langle Q,\{0,1\},\delta,q_0,\{q_{n+1}\}\rangle$, with $\delta(q_0,0)=\{q_0\}$, $\delta(q_0,1)=\{q_0,q_1\}$ and $\delta(q_i,0)=\delta(q_i,1)=\{q_{i+1}\}$ for $i\in\{1,\ldots,n\}$. The DFA resulting of applying the powerset construction to this NFA will have $2^n$ states, because you need to represent all $2^n$ words of length $n$ as suffixes of a word in $L$.

OTHER TIPS

The worst-case of $2^{s}$ comes from the number of subsets of states of the NFA. To have the algorithm from Kleene's theorem give an equivalent DFA with the worst-case number of states, there must be a way to get to every possible subset of states in the NFA. An example with two states over alphabet $\{a, b\}$ has a transition from the initial state to the sole accepting state on symbol $a$, a transition from the accepting state back to the initial on $b$, and a transition from the accepting state back to itself on either an $a$ or a $b$. The strings $\lambda$, $a$, $b$, and $ab$ lead to subsets $\{q_{1}\}$, $\{q_{2}\}$, $\{\}$, and $\{q_{1}, q_{2}\}$, respectively, and these would need separate states in the DFA Kleene gives.

I believe this is a question at the frontier of knowledge, i.e. basically a research question. From a quick google search, it appears to be mostly open. Also, for many years I have believed it to be important and linked to complexity theory lower bounds. You don't mention directly a statistical analysis but that is what is implied by your question. Here are two examples of statistical studies on DFAs/NFAs that are similar to show the general approach to questions of this type. It appears that basic empirical research into questions like this is still mostly unexplored. Admittedly the 2nd does not relate directly to your question but it's the closest I could find of current research.

To study your question, a statistical attack such as the following could be envisioned. Random NFAs are constructed. Then the minimal DFA is determined. Graph the histogram results of how many DFAs of size $x$ result. Separate out the "large" DFAs based on some threshold. Formulate some metric or measurement of the NFA that gives an estimate of the resulting DFA size.

This metric would be related to graph theory metrics such as edge density etcetera. There is probably some very important graph theory metric or mix of metrics that estimates the "blow-up" but its not immediately obvious to me. I could suggest something like graph coloring metrics or clique metrics maybe. Then test the metric against the two sets "blow-up" vs "not blown-up".

Other answers to your question so far only give an example case of a "blow-up" (useful for a case study) but do not address the key issue of a general metric.

Another area to look at a successfully developed program of empirical research is SAT transition point research. That has developed very deep links to physics and thermodynamics concepts. It seems likely to me that similar concepts are applicable here. For example, one is likely to find analogous transition point type metrics; probably edge density etc. Note parallels to Kolmogorov compressibility theory.

I conjecture also that NFAs that "blow-up" vs those that don't are quite analogous somehow to "hard" vs "easy" instances of NP-complete problems.

Yet another way to study this problem would be to formulate an NFA minimization problem. That is, given a DFA, find the minimal NFA, which last I heard (many years ago) was still an open problem.

[1] On the performance of automata minimization algorithms Marco Almeida, Nelma Moreira, Rogério Reis

[2] Automata Recognizing No Words: A Statistical Approach Cristian S. Calude, Cezar Câmpeanu, Monica Dumitrescu

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange