Question

<div>
     <a>abc</a>
     xyz
</div>

Given the above HTML structure, $divElement->nodeValue returns 'abc xyz', when I want to get 'xyz' only. $divElement->getAttribute('value') is empty.

How can I get 'xyz' without removing the <a> element?

Was it helpful?

Solution

Just iterate through the <div> and combine all text node:

http://3v4l.org/fnTAF

$dom=new DOMDocument;
$dom->loadHTML(<<<HTML
<div>
     <a>abc</a>
     xyz
</div>
HTML
);
$div=$dom->getElementsByTagName("div")->item(0);
var_dump($div->childNodes->length);//just to debug
$txt="";
foreach(range(0,$div->childNodes->length-1) as $idx)
{
    if($div->childNodes->item($idx)->nodeType==3)
    {
        $txt.=$div->childNodes->item($idx)->nodeValue;
    }
}
var_dump($txt);

nodeType==3 means text node. The corresponding nodeName is #text.

OTHER TIPS

Your <div> actually has three children: one text node, one <a> node and another text node. At least that's what the XML standard says.
The first text node contains the white space between <div> and <a>. The second one contains your xyz.

If you inspect $divElement->childNodes, I believe you should get those two nodes, and you can make a distinction between the two.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top