What are the theoretical implications of unbounded lookbehind?
-
02-10-2020 - |
Frage
Most languages allow fixed-length or finite-length lookbehind. One notable exception is .NET, which allows the use of the * operator.
However, .NET regexs can already recognize balanced parentheses using named capture, which is not a regular language. Are regexs still regular with * in lookbehind? Extended answers for subexpressions other than * (for example, additional lookaround!) would also be appreciated.
tl;dr: Do regexs stay regular with * in lookbehind?
Lösung
I believe the answer here: Does lookaround affect which languages can be matched by regular expressions? can be extended to prove that adding * in lookbehind (or even nesting such lookbehinds and lookaheads) does not affect the 'regularness' of the expressions. I haven't put more thought into it though.
Hope that helps!
Andere Tipps
.NET's unbounded lookbehind is merely a refinement of an already non-regular feature: fixed, finite or infinite, lookbehinds have no place in a regular grammar. Nor do lookaheads, capturing groups, backreferences, reluctant quantifiers, possessive quantifiers, atomic groups, conditionals, word boundaries, anchors...
If we had to limit ourselves to theoretically-pure regular expressions, 99.9% of current regex users would have no use for them. Asking if a feature is "regular" is a waste of breath; is it useful? That's all that matters.
Regular expressions are closed under intersection. Add a new symbol & and rewrite the lookbehind: A(?<B)C as (?:AC&.*BC), and we get that lookbehind is regular.
B can include clearly use anything that doesn't go past the A/C boundry. That is, anything except lookahead. What happens if lookbehind may use lookahead, or vice-versa? Start work on .*BC . You're still fine.
So, regular expressions could really add in intersection and infinite-length lookaround (which can include more lookaround to any depth) and it would still be just as efficient.