Question

would you have a hint at algorithm to convert any regular expression to a finite state machine. For instance, an algorithm parsing a regexp and adding states to the fsm appropriately? Any reference or deeper idea?

I am writting this with Python

Thanks and regards

Was it helpful?

Solution

Use Michael Sipser's Introduction to the Theory of Computation. Chapter 1 gives detailed algorithms for converting a regular expression to a deterministic or non-deterministic finite-state automaton (DFA or NFA), in the context of proving their equivalence (a DFA, an NFA and a regular expression can match exactly the same classes of strings).

The general idea is: You convert a regular expression to an NFA, which can be done quite straightforwardly (* is a loop, | and character ranges are branch points). You then convert the NFA into a (much larger) DFA, which involves creating one DFA state for each set of alternative NFA states. The DFA has at most as many states as the powerset of the set of NFA states (e.g., an NFA with 3 states can get transformed to a DFA with at most 2^3 = 8 states), and can recognize any target string without backtracking. Read the book for the details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top