Question

I'm working on my PHP script to parse the html web page. I'm using file_get_contents to open the url to get the list of contents.

Here's the code:

$links = $row['links'];
$result = file_get_contents($links);
$html_content = str_replace("<a id='rowTitle1' class", "<a id='rowTitle1' class",$result);
print $html_content;

Here's the html output:

<li class="zc-ssl-pg" id="row1-1" style="">
<span id="row1Time" class="zc-ssl-pg-time">6:00 PM</span>
<a id="rowTitle1" class="zc-ssl-pg-title" href='http://www.mysite.com'>The Middle</a>
<a class="zc-ssl-pg-ep" href='http://www.mysite.com'>"Thanksgiving IV"</a>

Can you please tell me how I can get the values from the row1Time, rowTitle1 and the zc-ssl-pg-ep tags tags in the row1-1 class using with file_get_contents?

Was it helpful?

Solution

Regular expressions are not the right tool for parsing HTML. The DOM is the right tool for that job:

$dom = new DOMDocument();
$dom->loadHTML($result);
echo $dom->getElementById('row1Time')->nodeValue . "<br>";
echo $dom->getElementById('rowTitle1')->nodeValue . "<br>";
echo $dom->getElementsByTagName('a')->item(1)->nodeValue;

See it in action

This is still a little bit iffy because of how the HTML is structured but if it isn't going to change this will work.

OTHER TIPS

$links = $row['links'];
$result = file_get_contents($links);
// $html_content = str_replace("<a id='rowTitle1' class", "<a id='rowTitle1' class",$result); // thats useless !

preg_match('/<span id="row1Time" class="zc-ssl-pg-time">([^<]+)<\/span>/', $html_content, $matches);
$row1Time = $matches[1];

preg_match('/<a id="rowTitle1" class="zc-ssl-pg-title" href='http:\/\/www\.mysite\.com'>([^<]+)<\/a>/', $html_content, $matches);
$rowTitle1 = $matches[1];

print $html_content;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top