How to scrape html contents of one div by id using php

https://stackoverflow.com/questions/18123923

23-06-2022
|

Question

The page on another of my domains which I'd like to scrape one div from contains:

<div id="thisone">
    <p>Stuff</p>
</div>

<div id="notthisone">
    <p>More stuff</p>
</div>

Using this php...

<?php
    $page = file_get_contents('http://thisite.org/source.html');
    $doc = new DOMDocument();
    $doc->loadHTML($page);
    foreach ($doc->getElementsByTagName('div') as $node) {
        echo $doc->saveHtml($node), PHP_EOL;
    }
?>

...gives me all divs on http://thisite.org/source.html, with html. However, I only want to pull through the div with an id of "thisone" but using:

foreach ($doc->getElementById('thisone') as $node) {

doesn't bring up anything.

Solution

$doc->getElementById('thisone');// returns a single element with id this one

Try $node=$doc->getElementById('thisone'); and then print $node

On a side note, you can use phpQuery for a jquery like syntext: pq("#thisone")

OTHER TIPS

$doc->getElementById('thisone') returns a single DOMElement, not an array, so you can't iterate through it

just do:

$node = $doc->getElementById('thisone');
echo $doc->saveHtml($node), PHP_EOL;

Look at PHP manual http://php.net/manual/en/domdocument.getelementbyid.php getElementByID returns an element or NULL. Not an array and therefore you can't iterate over it.

Instead do this

<?php
    $page = file_get_contents('example.html');
    $doc = new DOMDocument();
    $doc->loadHTML($page);
    $node = $doc->getElementById('thisone');
     echo $doc->saveHtml($node), PHP_EOL;
?>

On running php edit.php you get something like this

<div id="thisone">
      <p>Stuff</p>
  </div>

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow