Question

so basically I have a remote webpage that I need to get the contents of like so:

$src = file_get_contents('http://example.com/comp.html');
$text = new domDocument;
@$text->loadHTML($src); // I read on a separate post that the '@' supresses warnings
$text->preserveWhiteSpace = false;

The comp.html page looks like this

<html>
<div id = "test1">
<img src = "http://example.com/monkey"/>
</div
<div id = "test2">
<img src = "http://example.com/apples"/>
</div>
</html>

I want to get the image source for div id 'test2' but ignore test1 since it will not be in the conditional. I then want to grab the string of img src in test2 'http://example.com/apples' and break it apart to only get whatever value comes after the .com/ so if the site is 'http://example.com/oranges' it will get the value 'oranges'. I then want to store this value into a variable.

To do this I have something like this following the above code:

$text2 = $text->getElementsByTagName('img');
foreach ($text2 as $image) {
    $image->getAttribute('src');
 // My question is after this, how would I proceed?
}
Was it helpful?

Solution

Proceed like this..

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $dtag) {
    if ($dtag->getAttribute('id') === 'test2') {
        foreach($dtag->getElementsByTagName('img') as $itag)
        echo basename($itag->getAttribute('src')); //"prints" apples
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top