How to keep <p><img … /></p> with XPATH?

https://stackoverflow.com/questions/7860747

11-02-2021
|

Вопрос

I use XPATH to remove untidy HTML tags,

$nodeList = $xpath->query("//*[normalize-space(.)='' and not(self::br)]");
    foreach($nodeList as $node) 
    {
        $node->parentNode->removeChild($node);
    }

will remove the horrible input like these,

<p><em><br /></em></p>
<p><span style="text-decoration: underline;"><em><br /></em></span></p>

but it also removes the img tag like blow that I want to keep,

<p><img title="picture summit" src="images/32913430_127001_e.jpg" alt="picture summit" width="590" height="366" /></p>

How can I keep the img tag input with XPATH?

Решение

Use:

//p[not(descendant::*[self::img or self::br]) and normalize-space()='']

Другие советы

Maybe you could use an XPath 1.0 expression like the one below to remove unwanted paragraphs:

//p[count(text())=0 and count(img)=0]

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow