Is there a library that provides static analysis of regular expressions?
-
19-09-2019 - |
Question
Specifically, is there a library that, when given 2 (or more) regular expressions, can tell if exists an input that both would match? Bonus points if it's easily accessible via Java or .NET, but command-line would be fine as well.
Asker's log, supplemental:
The regular expressions that would be fed to this algorithm are fairly simple. While I believe there are a couple with lookaheads, they are all fairly simple combinations of literals or character classes with fixed minimum and maximum lengths.
Solution
I found a python library that lets me do what I need to do.
>>> import reCompiler
>>> fsa1 = reCompiler.compileRE('\d\d\d?\d?a')
>>> fsa2 = reCompiler.compileRE('123a')
>>> fsa3 = reCompiler.compileRE('a23a')
>>> print len(FSA.intersection(fsa1, fsa2).finalStates)
1
>>> print len(FSA.intersection(fsa1, fsa3).finalStates)
0
The library is called pyFSA. I will need to implement some preparsing to turn statements like \d{2,4} into \d\d\d?\d?, but other than that it should suit my needs nicely. Thanks for the input, and if people find libraries that implement this in other languages, by all means include them.
OTHER TIPS
If there were it would not run in a useful amount of time. Comparing regex is a PSPACE problem
http://en.wikipedia.org/wiki/PSPACE-complete
You may have some luck if you can allow extra restrictions on your regex
If, I understand you correctly, you would like to know if the intersection of 2 regular expressions is the empty set or not? I believe that's hard, but I wouldn't be surprised if the complexity was exponential in the length of the regex (though some regexes would obivously be easier than others)
Regardless, here's a Haskell implementation: http://sulzmann.blogspot.com/2008/11/playing-with-regular-expressions.html
And a prolog implementation http://www.let.rug.nl/vannoord/Fsa/
This may be a place to start.