How to figure out if a regex implementation uses DFA or NFA?

https://stackoverflow.com/questions/8268042

08-03-2021
|

Question

I'm facing the question, whether a certain regex implementation is based on a DFA or NFA.

What are the starting points for me to figure this out. One could also ask: What am I looking for? What are the basic patterns and / or characteristics? A good and explanatory link or a little comparisons (even if not directly dedicated to regex) is perfectly fine.

Solution

I think you mean "regex implementation" rather than algorithm (in the usual sense).

You could test with know expressions that are known to cause problems with one approach or the other. Also looking for features that are easier to implement in one or the other (this is not a reliable approach – the developers of regex engines find new ways to implement previously hard things).

Normally the answer is to read the documentation, or look in a known reference ("Mastering Regular Expressions" documents many popular cases). Finally why not ask the authors?

OTHER TIPS

If it's a black box, then give it some input and measure its time characteristics with a pathological case, with reference to the graphs in this discussion of NFS vs backtracking regex implementations. (note the NFS graph is microseconds not seconds).

Also, if it's a pure NFA, then it won't have some non-regular features which are found is some 'regular expression' parsers, which require backtracking.

Alternatively, look at the documentation of the RxParser class; documentation appears to be unavailable on the web and requires a squeak runtime to browse.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow