Question

I'm encountering an issue where preg_replace() with a complicated regular expression causes an error (PREG_BACKTRACK_LIMIT_ERROR) due to pcre.backtrack_limit being too low, which is set to 1,000,000 by default. I set this to 10,000,000, and it works for this particular application.

My question is, what exactly is backtracking limit's, loosely defined, "unit"? Does the 1,000,000 figure correspond to memory size? If not, what does it signify? I'm trying to understand what a reasonable setting for this on my environment.

Reference on preg_replace: http://us3.php.net/manual/en/pcre.configuration.php#ini.pcre.backtrack-limit

Reference on backtracking: In regular expressions, what is a backtracking / back referencing?

Was it helpful?

Solution

From the PCRE source code, this error is returned when "match()" is called more than 1,000,000 times recursively:

/* First check that we haven't called match() too many times, or that we
haven't exceeded the recursive call limit. */

if (md->match_call_count++ >= md->match_limit) RRETURN(PCRE_ERROR_MATCHLIMIT);

That is converted into a "PHP_PCRE_BACKTRACK_LIMIT_ERROR" error here.

According to the pcreapi manpage (see https://serverfault.com/a/408272/140833 ):

Internally, PCRE uses a function called match() which it calls repeatedly (sometimes recursively). The limit set by match_limit is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.

I think that the unit is therefore something like "Number of backtracking attempts". I'm not sure that it's 1-to-1 with that though.

Here's a demo isolating the error case with a simple "Catastrophic Backtracking" regex:

<?php

ini_set('pcre.backtrack_limit', 100);

for ($len = 1000; $len <= 1001; $len++) {

    $x = str_repeat("x", $len);
    $ret = preg_match("/x+x+y/", $x);

    echo "len = " . $len . "\n";
    echo "preg_match = " . $ret . "\n";
    echo "PREG_BACKTRACK_LIMIT_ERROR = " . PREG_BACKTRACK_LIMIT_ERROR . "\n";
    echo "preg_last_error = " . preg_last_error() . "\n";
    echo "\n";
}

Run this code here: https://3v4l.org/EpaNC, to get this output:

len = 1000
preg_match = 0
PREG_BACKTRACK_LIMIT_ERROR = 2
preg_last_error = 0

len = 1001
preg_match = 
PREG_BACKTRACK_LIMIT_ERROR = 2
preg_last_error = 2

OTHER TIPS

Don't know if this will help : According to pcre's source code this error code comes when pcre triggers an PCRE_ERROR_MATCHLIMIT. And according to this changelog of pcre, this is probably your fault because your regex is probably causing a memory leak.

I could suggest to review your regex as a best way to solve your problem, otherwise, if you insist to make it work, you can do (but i don't recommend) smoething like this : ini_set('pcre.backtrack_limit', PHP_INT_MAX);

[edit] i believe this setting is all about pcre's heavy processing capabilities, that's why i suggest to review you regex to try to make it lighter (split in into multiple regexes, add more iterations on your data, etc...)

This ini_set("pcre.backtrack_limit", "5000000"); worked for me. I placed this at the beginning stage of my mpdf page and within 1:04 minutes my 276 pages document was generated.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top