This is the code that I have worked on.
<?php
$content_old = <<<'EOM'
<p> </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;
$content = preg_replace("/<p[^>]*>[\s| ]*<\/p>/", '', $content_old);
$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);
foreach ($xp->query('//p/strong') as $node) {
$parent = $node->parentNode;
if ($parent->textContent == $node->textContent &&
str_word_count($node->textContent) <= 8) {
$header = $doc->createElement('h2');
$parent->parentNode->replaceChild($header, $parent);
$header->appendChild($doc->createTextNode( $node->textContent ));
}
}
// just using saveXML() is not good enough, because it adds random html tags
$xp = new DOMXPath($doc);
$everything = $xp->query("body/*"); // retrieves all elements inside body tag
$output = '';
if ($everything->length > 0) { // check if it retrieved anything in there
foreach ($everything as $thing) {
$output .= $doc->saveXML($thing) . "\n";
}
};
echo "--- ORIGINAL --\n\n";
echo $content_old;
echo "\n\n--- UPDATED ---\n\n";
echo $output;
When I run the script, this is the output that I get:
--- ORIGINAL --
<p> </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
--- UPDATED ---
<p>lol<strong>test</strong></p>
<h2>This is a header</h2>
<p>Content content blah blah blah.</p>
Update #1
It's worth nothing that if there are other tags inside the <p><strong>
tag (for example, <p><strong><a>
) then the entire <p>
will be replaced, which was not my intention.
This is easily fixed by changing the if to this:
if ($parent->textContent == $node->textContent &&
str_word_count($node->textContent) <= 8 &&
$node->childNodes->item(0)->nodeType == XML_TEXT_NODE) {
Update #2
It's also worth noting that the original createElement would cause issues if the content inside the <p><strong>
contained HTML characters that should be escaped (for example &
).
The old code was:
$header = $doc->createElement('h2', $node->textContent);
$parent->parentNode->replaceChild($header, $parent);
The new code (which works correctly) is:
$header = $doc->createElement('h2');
$parent->parentNode->replaceChild($header, $parent);
$header->appendChild($doc->createTextNode( $node->textContent ));