Pergunta

I have some really messy HTML with lots of spans and other tags.

I'm trying to only keep <span style="font-weight: bold"> while removing the other such span tags.

I have this so far:

$content = strip_tags($content, '<br>,<quote>,<code>,<pre>,<ul>,<li>,<ol>,<span>');

I want to remove <span> because it adds all the other spans globally, I'd just want the spans with font-style in them. How can I do this?

Foi útil?

Solução

strip_tags can't do this.

Take a look at HTML Purifier. It's designed exactly for this use case. You can give it a whitelist of tags and attributes to allow. It also has basic CSS parsing, allowing you to whitelist and blacklist CSS properties.

In this case, you'd probably do something like:

// This has not been tested, but should work
$configuration->set('HTML.Allowed', 'br,quote,code,pre,ul,li,ol,span[style]');
$configuration->set('CSS.AllowedProperties', 'font-weight');

Now, you're still going to be left with some extra span tags. You've suggested that you simply want them gone. This is going to be a bit stickier. You want to use a DOM manipulation tool to find each useless span, capture the contents, remove the span, then insert the contents where the span was. phpQuery was already pointed out, and Simple HTML DOM should also do the trick. PHP's own DOM extension can also do this, but it's going to be much more of a bear.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top