Grab, cache and parse remote XML feed, validation checks in PHP
-
20-09-2019 - |
Question
Currently, I'm grabbing a remote site's XML feed and saving a local copy on my server to be parsed in PHP.
Problem is how do I go about adding some checks in PHP to see if the feed.xml file is valid and if so use feed.xml.
And if invalid with errors (of which sometimes the remote XML feed somes display blank feed.xml), serve a backup valid copy of the feed.xml from previous grab/save ?
code grabbing feed.xml
<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL,
'http://domain.com/feed.xml');
/**
* Create a new file
*/
$fp = fopen('feed.xml', 'w');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
/**
* Execute the cURL session
*/
curl_exec ($ch);
/**
* Close cURL session and file
*/
curl_close ($ch);
fclose($fp);
?>
so far only have this to load it
$xml = @simplexml_load_file('feed.xml') or die("feed not loading");
thanks
Solution
If it's not pricipial that curl should write directly into file, then you could check XML before re-writing your local feed.xml:
<?php
/**
* Initialize the cURL session
*/
$ch = curl_init();
/**
* Set the URL of the page or file to download.
*/
curl_setopt($ch, CURLOPT_URL, 'http://domain.com/feed.xml');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$xml = curl_exec ($ch);
curl_close ($ch);
if (@simplexml_load_string($xml)) {
/**
* Create a new file
*/
$fp = fopen('feed.xml', 'w');
fwrite($fp, $xml);
fclose($fp);
}
?>
OTHER TIPS
How about this? No need to use curl if you just need to retrieve a document.
$feed = simplexml_load_file('http://domain.com/feed.xml');
if ($feed)
{
// $feed is valid, save it
$feed->asXML('feed.xml');
}
elseif (file_exists('feed.xml'))
{
// $feed is not valid, grab the last backup
$feed = simplexml_load_file('feed.xml');
}
else
{
die('No available feed');
}
In a class I put together, I have a function that checks if the remote file exists and if it's responding in a timely manner:
/**
* Check to see if remote feed exists and responding in a timely manner
*/
private function remote_file_exists($url) {
$ret = false;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true); // check the connection; return no content
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1); // timeout after 1 second
curl_setopt($ch, CURLOPT_TIMEOUT, 2); // The maximum number of seconds to allow cURL functions to execute.
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11');
// do request
$result = curl_exec($ch);
// if request is successful
if ($result === true) {
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($statusCode === 200) {
$ret = true;
}
}
curl_close($ch);
return $ret;
}
The full class contains fall-back code to make sure we always have something to work with.
Blog post explaining the full class is here: http://weedygarden.net/2012/04/simple-feed-caching-with-php/
Code is here: https://github.com/erunyon/FeedCache