سؤال

The idea of lazy and greedy are easy to understand, but I have only seen/used *+ once in my regex (in Java) [A]|[^B]*+(?!C) (A,B,C are arbitrary values) simply because it worked when the lazy modifier resulted in a StackOverflow error.

Because of most search engines' inability to search symbols, I can't find any documentation on this. So what exactly does *+ do and how does it do it?

هل كانت مفيدة؟

المحلول

A greedy quantifier matches everything it can and then the pattern backtracks until the match succeeds.

A lazy quantifier forward tracks until the match succeeds.

A possessive quantifier matches everything it can and never backtracks.

The + denotes a possessive quantifier. If can be used as, for example, ++ or *+.

This ability to prevent backtracking means it can stop catastrophic backtracking.

نصائح أخرى

As the other answers point out, *+ is a "possessive quantifier" It matches the previous element as many times as possible, just like a greedy quantifier, but never backtracks.

Why is this useful? Only as a performance optimization. Further, only as a performance optimization when the regex doesn't match. This is an important point to understand about regular expressions: their worst-case performance always occurs when they don't match.

Depending on the regex engine in use, and the details of the regex itself, worst-case performance can sometimes be stunningly bad. For a simple example, take this regex: a*a*a*b, matched against this string: aaaaac.

Confronted with this situation, a standard "NFA" type regex engine will do something like this:

  1. Try matching the 1st a 5 times, the 2nd a zero times, and the 3rd a zero times.
  2. Try matching the b against the c -- it fails.
  3. "Backtrack" and match the 1st a 4 times, the 2nd 1 time, and the 3rd zero times.
  4. Try matching the b again -- it fails.
  5. Backtrack again, try the 1st a 4 times, the 2nd zero times, and the 3rd 1 time.
  6. ...
  7. Backtrack, try the 1st a 3 times, the 2nd 2 times, and the 3rd zero times...

(I think you can fill out the next few hundred steps by yourself.)

If the regex was a*+a*a*b, that would never happen. It would be more like:

  1. Try matching the 1st a 5 times, the 2nd zero times, and the 3rd zero times.
  2. Try matching the b -- it fails.
  3. Since the 1st a is "possessive", once it has matched 5 times, it can never try matching a smaller number of times. And since there are no as left in the string for the other a*s to match, they can only match zero times. There is nothing left to try, so the match as a whole fails.
  4. There is no step 4. You are done. While your friends are waiting for their regexps to finish executing, you can kick back your heels and crack open your cold beverage of choice.

X*+ means X, zero or more times (Possessive)

The possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top