How to pre-load data from many external XML pages to display their info on a website in near real-time?

StackOverflow https://stackoverflow.com/questions/17539139

Question

I am trying to build a script that will pull data from many (50+) different external XML pages, parse it into a table using PHP, and do it behind the scenes using a cron job, so the collected data can be displayed on my site with no loading delay for users.

The purpose of this script is to assemble a live feed of a Steam Community group's member roster, showing currently online members and the game they are playing. It does this by first checking the group's XML page to get an updated member list, then using that info, it checks each individual member's XML file to get their current online and game status.

I have been successful to a point. The data gets loaded and displayed correctly, and with no loading delay, about 80% of the time. However, the other 20% of the time, users experience a complete inability to load the website beyond and including the part of the page where the script gets loaded. It just loads everything up until that point, hangs for a couple minutes, and then works properly after a refresh. I've been unable to replicate the conditions for the hang-up, it just happens randomly every so often.

I suspect that it is the cron job running the script (at 3 minute intervals) that causes the delay, but that is really outside my area of (already limited) understanding.

Is there a better way to do what I'm looking for? Or any idea what's causing the intermittent hang-ups?

Thanks in advance for any help!

<?php
$myFile = "steamfeed.php";
$fh = fopen($myFile, 'w');

$xml = simplexml_load_file('http://steamcommunity.com/groups/sundered/memberslistxml/?xml=1');
$members = $xml->xpath('//steamID64');
foreach($members as $steamID64) {

$xml2 = simplexml_load_file('http://steamcommunity.com/profiles/'.$steamID64.'/?xml=1');

if ( $xml2->onlineState != 'offline' ) {


$steam_game = substr($xml2->inGameInfo->gameName, 0, 25); 

$stringData = '<table width="280px" cellspacing="0" cellpadding="0" valign="top" style="vertical-align:text-top;"><tr><td               style="background-image:url(\'http://www.thesunderedguard.com/images/statusbg.gif\');" width="288px" height="30px"><table width="100%"><tr><td width="50%" height="30px" style="text-align:left;"><a href="http://steamcommunity.com/profiles/'.$steamID64.'/" target="_blank" style="color:#CDCDCD;">'.$xml2->steamID.'</a></td><td width="50%"><a href="'.$xml2->inGameInfo->gameLink.'" target="_blank">'.$steam_game.'</a></td></tr></table></td></tr></table>';
fwrite($fh, $stringData);

} 
    } 
    fclose($fh);
?>
Was it helpful?

Solution

The problem is that while the cron is running fetching the information you are locking the steamfeed.php file so when someone visit your site at the same time that the cron is running they have "to wait" until the job is done. What I recommend you is have a temporary file where you write all the returned content from the XMLs and then when its done just move the content to the file that you use in production.

Hope this works!

OTHER TIPS

Your problem is that when your cron job starts it immediately erases the current file. Anybody coming to your site once that has happened will see a blank page until the cron job completes its task.

You need to create a new content file in background before erasing the old one. This might be as simple as creating a temporary file, creating your content in it, then renaming the files and deleting the old one.

This might still give a problem while the files are being renamed. You could consider using a symbolic link, changing the file it link to on each occurrence of the cron job, and tidying up the old versions from time to time.

I'm sure there are other ways...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top