Domanda

Here's a sample which executes the preg_replace multiple times to find nested/overlapping matches:

$text = '[foo][foo][/foo][/foo]';
//1st:   ^^^^^     ^^^^^^
//2nd:        ^^^^^      ^^^^^^
//3rd: fails

do {
    $text = preg_replace('~\[foo](.*?)\[/foo]~', '[bar]$1[/bar]', $text, -1, $replace_count);
} while ($replace_count);

echo $text; //'[bar][bar][/bar][/bar]'

I'm satisfied with the result the and behavior. However, it seems inefficient to scan through the whole string 3 times as in the example above. Is there any regex magic to do this in a single replace?

Conditions:

  • I can't simply replace ~\[(/)?foo]~ with [$1bar], I need to make sure there is a matching closing [/foo] tag after an opening [foo] tag and replace them both at a time. It doesn't matter whether they're nested or not. Unpaired [foo] and [/foo] should not be replaced.

In JS I could set the Regex object's lastIndex property to the beginning of the match so that it starts matching again from the beginning of the last match. I couldn't find any startIndex option for regex replacing in PHP, and working with substr()ing could also be inefficient. I've looked around whether PCRE would have an achor for "start next match at this position" or similar but I had no luck.

Is there a better approach?


To clarify on unpaired tags, given the input:

[foo][foo][/foo]

I'm fine with either [bar][foo][/bar] or [foo][bar][/bar] as output. The former is the legacy behavior.

È stato utile?

Soluzione

A full regex solution is not possible for this specific case.

Your solution adapted to match paired tags (in the common sense):

$pattern = '~\[foo]((?>[^[]++|\[(?!/?foo]))*)\[/foo]~';
$result = $text;
do {
    $result = preg_replace($pattern, '[bar]$1[/bar]', $result, -1, $count);
} while ($count);

Another way that parses the string only once:

$arr = preg_split('~(\[/?foo])~', $text, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
$stack = array();
foreach ($arr as $key=>$item) {
    if ($item == '[foo]') $stack[] = $key;
    else if ($item == '[/foo]' && !empty($stack)) {
        $arr[array_pop($stack)] = '[bar]';
        $arr[$key] = '[/bar]'; 
    }
}
$result = implode($arr);

the performance of this second script is independant of the depth.

To answer the title question, yes it is possible to find overlapping matches with a single regex, however, you can't perform a replacement with this kind of pattern, example:

$pattern = '~(?=(\[foo]((?>[^[]++|\[(?!/?foo)|(?1))*)\[/foo]))~';
preg_match_all($pattern, $text, $matches);

The trick is to use a lookahead and a capturing group. Note that the whole match is always an empty string, this is the reason why you can't use this pattern with preg_replace.

Altri suggerimenti

A better way to do this is to find the end [/foo] and backtrack until you find a begin [foo] or [foo(space).*]. Replace match region with something else and keep doing it until no ending is found. But with regular strpos/stripos or plain old substr, not regex.

It might be achievable with regex, but I've always done this kind of thing with regular seeks as it's also faster.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top