You should remove any white space and rouge attributes so pretty much all attributes especially the on* Event Attributes like onClick,onBlur. theres are too many ways to add a XSS attack into HTML.Making something that will strip them all out would not be maintainable, so if you want to let users input HTML use htmlpurifier. Its easily initialized into your code and has lots of options.
A simple alternative would be to just extract the src of the img then remove the attributes and put the src back and make a string of images, then use strip_tags() to remove all HTML and then concatenate your images onto the text. It lacks the positioning of images though.
So something like:
<?php
$html = <<<DEMO
After a fair <script>alert('XSS');</script>few ...
winning the 2006 Production Privateer Championship.<br />
<div style="background-image: url(javascript:alert('XSS'))"></div>
<br />
<img src="http://i2.photobucket.com/albums/y18/moo0484/scan0001.jpg" border="0" class="tcattdimglink" onload="NcodeImageResizer.createOn(this);" alt="" /><br />
<br />
text here
<img src="http://i2.photobucket.com/albums/y18/moo0484/01072007065.jpg" border="0" class="tcattdimglink" onload="NcodeImageResizer.createOn(this);" alt="" /><br />
more txt here
DEMO;
$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
if (false === ($elements = $xpath->query("//*"))) die('Error');
foreach ($elements as $element) {
//remove script tags
if($element->nodeName=='script'){
$element->parentNode->removeChild($element);
}
//remove empty tags but not images
if (!$element->hasChildNodes() || $element->nodeValue == '') {
if($element->nodeName != 'img'){
$element->parentNode->removeChild($element);
}
}
//remove all attributes except links and imgs
for ($i = $element->attributes->length; --$i >= 0;) {
$name = $element->attributes->item($i)->name;
if (('img' === $element->nodeName && 'src' === $name) || ('a' === $element->nodeName && 'href' === $name)){
continue;
}
$element->removeAttribute($name);
}
}
//put dom together and remove the document body
echo preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '', $dom->saveHTML());
/*
<p>After a fair few ...
winning the 2006 Production Privateer Championship.</p>
<img src="http://i2.photobucket.com/albums/y18/moo0484/scan0001.jpg">
text here
<img src="http://i2.photobucket.com/albums/y18/moo0484/01072007065.jpg">
more txt here
*/
Though just look into using htmlpurifier, also the 1990's are calling they want there BBCODE back use markdown instead. ;p
Good luck