
I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">

<div id="interestingbox"> 

I'm trying to get the contents of the many div boxes using php. How can I use the DOM parser to do this?


Was it helpful?


First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

Code to get the contents of the div with id="interestingbox"

$html = '
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">

<div id="interestingbox2"><a href="#">a link</a></div>

$dom_document = new DOMDocument();


//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";


[div]  {

Example with classes:

$html = '
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">

<div class="interestingbox"><a href="#">a link</a></div>

//the same as before.. just change the xpath


$elements = $dom_xpath->query("*/div[@class='interestingbox']");


[div]  {

[div]  {
a link

Refer to the DOMXPath page for more details.


I got this to work using simplehtmldom as a start:

$html = file_get_html('');
foreach ($html->find('div[id=interestingbox]') as $result)
    echo $result->innertext;

Very nice function from

function innerXML($node) 


    $doc  = $node->ownerDocument; 

    $frag = $doc->createDocumentFragment(); 

    foreach ($node->childNodes as $child) 




    return $doc->saveXML($frag); 


$dom = new DOMDocument(); 






    <td id="foo">  

        The first bit of Data I want 

        <br />The second bit of Data I want 

        <br />The third bit of Data I want 







$xpath = new DOMXPath($dom); 

$node = $xpath->evaluate("/html/body//td[@id='foo' ]"); 

$dataString = innerXML($node->item(0)); 
$dataArr = explode("<br />", $dataString); 

$dataUno = $dataArr[0]; 
$dataDos = $dataArr[1]; 
$dataTres = $dataArr[2]; 

echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"  

WebExtractor: It can parse page with css, regex, xpath selectors.

Look package and tests for examples:

use WebExtractor\DataExtractor\DataExtractorFactory; use WebExtractor\DataExtractor\DataExtractorTypes; use WebExtractor\Client\Client;

$factory = DataExtractorFactory::getFactory(); $extractor = $factory->createDataExtractor(DataExtractorTypes::CSS); $client = new Client; $content = $client->get(''); $extractor->setContent($content); $h1 = $extractor->setSelector('h1')->extract();

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top