Question

I am parsing html in php and as I have no control over the original content I want to strip it of styling and unnecessary tags while still keep the content and a short list of tags, namely:

p, img, iframe (and maybe a couple of others)

I know I can remove a given tag (see code I am using for this below), but as I don't necessarily know what tags their could possibly be, and I don't want to create a huge list of possibles, I would like to be able to strip everything except my allowed list.

function DOMRemove(DOMNode $from) {
    $sibling = $from->firstChild;

    do {
        $next = $sibling->nextSibling;
        $from->parentNode->insertBefore($sibling, $from);
    } while ($sibling = $next);

    $from->parentNode->removeChild($from);
}

$dom = new DOMDocument;
$dom->loadHTML($html);

$nodes = $dom->getElementsByTagName('span');
Was it helpful?

Solution

As spoken by cpattersonv1 above, you can simply use strip_tags() for the job.

<?php

// strip all other tags except mentioned (p, img, iframe)
$html_result = strip_tags($html, '<p><img><iframe>');

?>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top