Iterate over all text nodes you can find inside the given HTML and create a special case for parent anchors:
$text = 'Check <a href="link1">text1</a> or <a href="link2">text2</a>
oh
well <a href="link3">text3</a>';
$dom = new DOMDocument;
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//text()') as $node) {
if ($node->nodeType == XML_TEXT_NODE) {
echo $node->textContent, "\n";
if ($node->parentNode->nodeType == XML_ELEMENT_NODE && $node->parentNode->nodeName == 'a') {
echo $node->parentNode->getAttribute('href'), "\n";
}
}
}
In a textual domain, you would do it like this:
echo preg_replace('~<a href="([^"]+)">([^<]+)</a>~i', "\n\$2\n\$1", $text);
Basically you use negative character sets for the href and tag contents enclosure instead of simply .+
and .*
because those are greedy by default; this can be changed by using .+?
and .*?
respectively, but a negative character set would lead to less backtracking.
Also, you only need to perform memory captures on two parts of the anchor, not all five of them.