Data structure to store the nodes of a DFA

https://stackoverflow.com/questions/17398928

02-06-2022
|

Frage

I need a data structure to store the nodes of a finite deterministic automaton so that finding nodes which satisfy a particular condition is fast (logarithmic). The condition in question is the following:

I have a node p, and I have to find a node q, such that: (p ∈ F ≡ q ∈ F) & (∀ a : a ∈ Σ : δ(p,a) = δ(q,a)). That is, p and q are either both final or both are not, and they have transitions to the same nodes.

I don't want to go through all the nodes because that would be slow. Obviously, if the set of alphabet characters, for which q has transitions, is different from the set, for which p has transitions, q isn't the node I'm looking for.

Furthermore, if p and q have a different number of transitions, q is again not the node I want. So I was thinking of a data structure that sorts the nodes according to their finality and number of transitions, so I don't have to look through all the nodes, just those which have the same finality and the same number of transitions. But that is still not logarithmic. Any ideas.

I'm using c++.

Lösung 2

I constructed a string for each node and I put the strings in an AVL tree. It executed faster than the solution with the hash function and the unordered map and used far less memory.

The string representing the node looks as follows: it starts with a 0 or 1, according to whether the node is final or not, and then the pairs (a,n) are coded as follows: a is an int corresponding to the position of the symbol a in the alphabet and n is another int, the index of the node it has transition to with the symbol a.

Andere Tipps

There are two types of information at a node q:

The boolean information (q ∈ F)
The list of pairs (a,n) such that δ(q,a)==n (i.e. the list of character/dest-node pairs reachable from q)

These two pieces of information (a boolean and a list of pairs) can be represented as a single sequence, and you can compute a hash key for this sequence.

This will enable to hash the nodes by the property you are interested in. Searching for candidate nodes q for a given node p will then be near O(1).

(For the minimization algorithm you mentioned in the comments this means you'll need to rebuild this hash after each iteration, because the destination node pointers in the pairs will change as a result of the actions performed during the iteration.)

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow