What are the theoretical implications of unbounded lookbehind?

https://stackoverflow.com/questions/3356458

02-10-2020
|

Frage

Most languages allow fixed-length or finite-length lookbehind. One notable exception is .NET, which allows the use of the * operator.

However, .NET regexs can already recognize balanced parentheses using named capture, which is not a regular language. Are regexs still regular with * in lookbehind? Extended answers for subexpressions other than * (for example, additional lookaround!) would also be appreciated.

tl;dr: Do regexs stay regular with * in lookbehind?

Lösung

I believe the answer here: Does lookaround affect which languages can be matched by regular expressions? can be extended to prove that adding * in lookbehind (or even nesting such lookbehinds and lookaheads) does not affect the 'regularness' of the expressions. I haven't put more thought into it though.

Hope that helps!

Andere Tipps

.NET's unbounded lookbehind is merely a refinement of an already non-regular feature: fixed, finite or infinite, lookbehinds have no place in a regular grammar. Nor do lookaheads, capturing groups, backreferences, reluctant quantifiers, possessive quantifiers, atomic groups, conditionals, word boundaries, anchors...

If we had to limit ourselves to theoretically-pure regular expressions, 99.9% of current regex users would have no use for them. Asking if a feature is "regular" is a waste of breath; is it useful? That's all that matters.

Regular expressions are closed under intersection. Add a new symbol & and rewrite the lookbehind: A(?<B)C as (?:AC&.*BC), and we get that lookbehind is regular.

B can include clearly use anything that doesn't go past the A/C boundry. That is, anything except lookahead. What happens if lookbehind may use lookahead, or vice-versa? Start work on .*BC . You're still fine.

So, regular expressions could really add in intersection and infinite-length lookaround (which can include more lookaround to any depth) and it would still be just as efficient.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow