When you load a piece of html code with DOMDocument, a Doctype, a html, head and body tag are added automatically (if missing) to this piece of html (and unclosed tags are closed) to make it a "valid" html document. So when you use saveHTML you save all of this. If I remember well, you can find several tricks to avoid this in the PHP manual (in the posts)
Why is the Doctype being printed on my page?
Question
I've imported the content from a blogger account into a Wordpress blog.
I've had to apply some xpath and regex to remove some nasty formatting.
global $post;
$html = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$doc = new DOMDocument();@$doc - > loadHTML($html);
$xpath = new DOMXPath($doc);
foreach($xpath - > query('//br[not(preceding::text())]') as $node) {
$node - > parentNode - > removeChild($node);
}
$nodes = $xpath - > query('//a[string-length(.) = 0]');
foreach($nodes as $node) {
$node - > parentNode - > removeChild($node);
}
$nodes = $xpath - > query('//*[not(text() or node() or self::br)]');
foreach($nodes as $node) {
$node - > parentNode - > removeChild($node);
}
remove_filter('the_content', 'wpautop');
$content = $doc - > saveHTML();
$content = ltrim($content, '<br>');
$content = strip_tags($content, '<br> <a> <iframe>');
$content = preg_replace(array('/(<br\s*\/?>\s*){1,}/'), array('<br/><br/>'), $content);
$content = str_replace(' ', ' ', $content);
$content = "<p>".implode("</p>\n\n<p>", preg_split('/\n(?:\s*\n)+/', $content))."</p>";
return $content;
For some reason though a random DOCTYPE is being printed inside my page and I don't know why.
<p>!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN” “http://www.w3.org/TR/REC-html40/loose.dtd”>
<br/>
<br/>When the battle is on between contestants in a talent show, it gets really competitive when down to the last four. X-FactorUSAcontestant Marcus Canty knows this all too well as this is the stage he was voted off of the show earlier this year.
<br/>
<br/>
</p>
Could someone point me in a direction as to why this is happening?
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow