Question

I am trying to understand the following paragraph (From Php.net: Link):

However, if the decimal number following the backslash is less than 10, it is always taken as a back reference, and causes an error only if there are not that many capturing left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward back reference" can make sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration.

As i understand so far, If the decimal is smaller than 10 the all left parentheses of the pattern is counted, And if the number is greater than 10 only the left parenthesis from the moment i used the reference (\13 for example) is counted.

For example:

Lets say we have this simple pattern:

'/^(a)(b)(c)(d)\6(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)$/';

This is the string we testing: abcdfefghikjklmn

So has php.net says:

In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10

So as the example shows the parentheses that are referenced is to the right and the reference number is less than 10 so why preg_match return 0?

If can some one please help understand this paragraph, Thank you all and have a nice day.

Was it helpful?

Solution

Outside a character class, a backslash followed by a digit greater than 0 (and possibly further digits) is a back reference to a capturing subpattern earlier (i.e. to its left) in the pattern, provided there have been that many previous capturing left parentheses.

However, if the decimal number following the backslash is less than 10, it is always taken as a back reference, and causes an error only if there are not that many capturing left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward back reference" can make sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration.

\n explained (n > 0 always):

  • n <= 9: Always a back reference to the nth group (no matter where the capturing group is). Results in an error if there are fewer than n capturing groups.
  • n >= 10: Only a back reference if there are at least n number of capturing groups (starting) before it. Otherwise it's an escape sequence as in regular strings.

OTHER TIPS

In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward back reference" can make sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration.

It means that something like this is valid:

(0\2|(112*))+

As you can see, the backreference to the 2nd capturing group \2 is declared even before the second opening parentheses ( is encountered.

Demo (ignore the error by regex101, the analysis didn't take care of this case)

Another valid example:

^(^a|aa\1)*$

This regex will match any string contains of only a that has length which is a square number, and reject otherwise. This is the example of backreference inside the pair of parentheses that is being referred to.

Demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top