How do you access Simple DOM selectors?

https://stackoverflow.com/questions/9039266

20-04-2021
|

Question

I can access some of the 'class' items with a

$ret = $html->find('articleINfo'); and then print the first key of the returned array.

However, there are other tags I need like span=id"firstArticle_0" and I cannot seem to find it.

$ret = $html->find('#span=id[ etc ]');

In some cases something is returned, but it's not an array, or is an array with empty keys.

Unfortunately I cannot use var_dump to see the object, since var_dump produces 1000 pages of unreadable junk. The code looks like this.

<div id="articlething"> 
    <p class="byline">By Lord Byron and <a href="www.marriedtothesea.com">Alister Crowley</a></p> 
    <p> 
    <span class="location">GEORGIA MOUNTAINS, Canada</span> | 
    <span class="timestamp">Fri Apr 29, 2011 11:27am EDT</span> 
    </p> 
</div> 
<span id="midPart_0"></span><span class="mainParagraph"><p><span        class="midLocation">TUSCALOOSA, Alabama</span> - Who invented cheese? Everyone wants to know. They held a big meeting. Tom Cruise is a scientologist. </p> 

</span><span id="midPart_1"></span><p>The president and his family visited Chuck-e-cheese in the morning </p><span id="midPart_2"></span><p>In Russia, 900 people were lost in the balls.</p><span id="midPart_3">

Solution

Simple HTML DOM can be used easily to find a span with a specific class.

If want all span's with class=location then:

// create HTML DOM
$html = file_get_html($iUrl);

// get text elements
$aObj = $html->find('span[class=location]');

Then do something like:

foreach($aObj as $key=>$oValue)
{
   echo $key.": ".$oValue->plaintext."<br />";
}

It worked for me using your example my output was:

label=span, class=location: Found 1

0: GEORGIA MOUNTAINS, Canada

Hope that helps... and please Simple HTML DOM is great for what it does and easy to use once you get the hang of it. Keep trying and you will have a number of examples that you just use over and over again. I've scraped some pretty crazy pages and they get easier and easier.

OTHER TIPS

Try using this. Worked for me very well and extremely easy to use. http://code.google.com/p/phpquery/

The docs on the PHP Simple DOM parser are spotty on deciphering Open Graph meta tags. Here's what seems to work for me:

<?php
// grab the contents of the page
$summary = file_get_html($url);

// Get image possibilities (for example)

$img = array();

// First, if the webpage has an og:image meta tag, it's easy:
if ($summary->find('meta[property=og:image]')) {
  foreach ($summary->find('meta[property=og:image]') as $e) {
    $img[] = $e->attr['content'];
  }
}
?>

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow