The reason seems to be the bactracking limit that is upper to the server capabilities (this is the reason why you get no error message).
You can limit the backtraking using:
$regex = '%^(?>
[\x09\x0A\x0D\x20-\x7E]++ # ASCII
| (?>[\xC2-\xDF][\x80-\xBF])++ # non-overlong 2-byte
| (?>\xE0[\xA0-\xBF][\x80-\xBF])++ # excluding overlongs
| (?>[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2})++ # straight 3-byte
| (?>\xED[\x80-\x9F][\x80-\xBF])++ # excluding surrogates
| (?>\xF0[\x90-\xBF][\x80-\xBF]{2})++ # planes 1-3
| (?>[\xF1-\xF3][\x80-\xBF]{3})++ # planes 4-15
| (?>\xF4[\x80-\x8F][\x80-\xBF]{2})++ # plane 16
)*+$%xs';
About backtracking, atomic groups and possessive quantifiers:
Backtracking is a mechanism used by the regex engine to explore other possibilities of matches from a position in the string when a subpattern fails at a position in the string.
Let's consider the string aaabcccb
and the pattern ^.+cb$
:
string | pattern | state ------------+------------+-------------------------- aaabcccb | ^.+cb$ | BEGIN aaabcccb | ^.+cb$ | OK aaabcccb | ^.+cb$ | FAIL aaabcccb | ^.+cb$ | BACKTRACK aaabcccb | ^.+cb$ | FAIL aaabcccb | ^.+cb$ | BACKTRACK aaabcccb | ^.+cb$ | OK aaabcccb | ^.+cb$ | OK aaabcccb | ^.+cb$ | OK, SUCCEED ------------+------------+--------------------------
This describes the default behavior of the regex engine, the subpattern with the greedy quantifier .+
takes all that is possible (all the string in this example), but after the regex engine must go back character by character to make the subpattern cb
succeed. A greedy quantifier allows this behavior and may get characters back.
You can forbid backtracking using a possessive quantifier. Example with ^.++cb$
:
string | pattern | state ------------+------------+-------------------------- aaabcccb | ^.++cb$ | BEGIN aaabcccb | ^.++cb$ | OK aaabcccb | ^.++cb$ | FAIL aaabcccb | ^.++cb$ | NO MATCH ------------+------------+--------------------------
The regex engines can't backtrack in the substring matched by .++
, the whole pattern fails immediatly since c
is not found.
An atomic group defines a subpattern in which the regex engine is not allowed to backtrack. In other words, possessive quantifiers and atomic groups are the same feature: (?>a+)
<=> a++
Note: However, keep in mind that the regex engine can always backtrack inside an atomic group as long as it is not closed: ^(?>.+c)b$
will succeed with the precedent string, but ^(?>.+)cb$
will fail.
Once an atomic group is closed, or when you use a possessive quantifier, the matched substring is an atom in the etymological meaning (i.e. something that can't be divided). However, the regex engine can always backtrack atom by atom, for example: ^(?>ab)+abc$
will match abababc
when ^(?>ab)++abc$
(or ^(?>(?>ab)+)abc$
) will fail.
One of the main advantage of atomic groups and possessive quantifiers (or the fact to forbid bactracking) is to reduce the number of steps to make a pattern succeed or fail.
Improvements:
Since possessive quantifiers and atomic groups are used everywhere, each substring is matched once and for all, and when a character isn't in one of these groups, the pattern will fail immediatly.
An other improvement is to add quantifier for each element of the alternation. An example with the string: zzzzzzzzzzzzza
with the pattern: (?:a|b|c|...x|y|z)+
The regex engine must try each part of the alternation until it find the good letter, and this for each letters (13x26 = 278 tests to obtain all z
)
with the pattern: (?>a+|b+|c+|...x+|y+|z+)+
The regex engine need only 26 tests, and once it arrived to z+
, it obtains all the z
.