Question

With preg_replace in PHP, I am trying to match a regex pattern multiple times in a string, sometimes there will be 2 matches on 1 lines, sometimes not.

I have the following string:

 $text = 'Check <a href="link1">text1</a> or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>';

I would like it to convert to:

 Check
 text1
 link1
 or
 text2
 link2
 oh
 well
 text3
 link3

I have this:

 $text = preg_replace('/(<a href=")(.+)(">)(.*)(<\/a>)/', "\n$4\n$2\n", $text);

But it doesn't work, only when having 1 match at a line. Like:

 $text = 'Check <a href="link1">text1</a> 
 or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>'; 

Any help appreciated.

Example with a and b http://www.phpliveregex.com/p/4fU

Was it helpful?

Solution

Iterate over all text nodes you can find inside the given HTML and create a special case for parent anchors:

$text = 'Check <a href="link1">text1</a> or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>';

$dom = new DOMDocument;
$dom->loadHTML($text);

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//text()') as $node) {
  if ($node->nodeType == XML_TEXT_NODE) {
        echo $node->textContent, "\n";
        if ($node->parentNode->nodeType == XML_ELEMENT_NODE && $node->parentNode->nodeName == 'a') {
                echo $node->parentNode->getAttribute('href'), "\n";
        }
  }
}

In a textual domain, you would do it like this:

echo preg_replace('~<a href="([^"]+)">([^<]+)</a>~i', "\n\$2\n\$1", $text);

Basically you use negative character sets for the href and tag contents enclosure instead of simply .+ and .* because those are greedy by default; this can be changed by using .+? and .*? respectively, but a negative character set would lead to less backtracking.

Also, you only need to perform memory captures on two parts of the anchor, not all five of them.

OTHER TIPS

NOT for your problem but you can add modifiers to a regex pattern after last slash:

preg_replace('/whatever_my_pattern_do/MODIFIERS',"here I replace", $text);

You should check them all here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top