Looping and file_get_contents in PHP
-
18-09-2019 - |
Question
I have written a page that will scan a site and then extract certain code from the source. That part is working successfully, however I want to run this over multiple pages and dump the details into a database. I am stuggling to get the loop working, this is what I currently have:
date_default_timezone_set("australia/sydney");
$host = 'http://www.tabonline.com.au/';
$day = date(d);
$month = date(m);
$year = date(Y);
$slash = '/';
$mtgraces = '/mtgraces.html';
//Gallops Meetings on Todays racing page
$content = file_get_contents($host . $year . "/". $month . "/" . $day . $mtgraces);
preg_match_all('#<a[^<>]+href\s*=\s*[\'"](.R[0-9]+.html*)[\'"]#i', $content, $matches);
foreach ($matches[1] as $url) $links[] = "$host$year$slash$month$slash$day$slash$url";
//get the runners from each page
for($c=0; $c<count($links); $c++)
$racepage = file_get_contents($links[$i]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);
//strip tags for storage in DB
$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number = explode(",", $dbnumber);
$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating = explode(",", $dbrating);
$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location = explode(",", $dblocation);
$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode = explode(",", $dblocationcode);
//Insert into database
$data = array();
for($i=0; $i<count($final_number); $i++)
{
$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
}
if(count($queries) == 0)
{
# Nothing passed
# exit
}
$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
$hostname = "%hostname%"; // eg. mysql.yourdomain.com (unique)
$username = "%username%"; // the username specified when setting-up the database
$password = "%password"; // the password specified when setting-up the database
$database = "%database"; // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");
mysql_query($query) OR die(mysql_error())
At the moment the output for this is giving me the correct contents of the last page in the list of sites (the $links
variable). Ultimately I want it to loop through the whole $links
variable and then import that data, using the $query
variable, into a database so I can do further analysis on it.
I hope this makes sense and you can see the error in my ways.
Solution 3
I have managed to figure it out!
I needed to put the whole lot in the for loop, so it looks like this:
for($c=0; $c<count($links); $c++)
{
$racepage = file_get_contents($links[$c]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);
//strip tags for storage in DB
$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number = explode(",", $dbnumber);
$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating = explode(",", $dbrating);
$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location = explode(",", $dblocation);
$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode = explode(",", $dblocationcode);
//Insert into database
$data = array();
for($i=0; $i<count($final_number); $i++)
{
$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
}
if(count($queries) == 0)
{
# Nothing passed
# exit
}
$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
$hostname = "%HOSTNAME"; // eg. mysql.yourdomain.com (unique)
$username = "%username%"; // the username specified when setting-up the database
$password = "%password%"; // the password specified when setting-up the database
$database = "%database%"; // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");
mysql_query($query) OR die(mysql_error());
}
Thank you all for your help, it seems like a great community that is here. I am sure to keep an eye on it for more fixes.
OTHER TIPS
Hmm... There are a few issues in here...
for($c=0; $c<count($links); $c++)
This loop is executing just the next line:
$racepage = file_get_contents($links[$i]);
However, $i isn't defined, I suspect you want $c. Also, you need to place some braces around various parts... Now, this is untested, but I think you want something like:
date_default_timezone_set("australia/sydney");
$host = 'http://www.tabonline.com.au/';
$day = date(d);
$month = date(m);
$year = date(Y);
$slash = '/';
$mtgraces = '/mtgraces.html';
//Gallops Meetings on Todays racing page
$content = file_get_contents($host . $year . "/". $month . "/" . $day . $mtgraces);
preg_match_all('#<a[^<>]+href\s*=\s*[\'"](.R[0-9]+.html*)[\'"]#i', $content, $matches);
foreach ($matches[1] as $url) $links[] = "$host$year$slash$month$slash$day$slash$url";
//get the runners from each page
$final_number = array();
$final_rating = array();
$final_location = array();
$final_locationcode = array();
for($c=0; $c<count($links); $c++)
{
$racepage = file_get_contents($links[$c]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);
//strip tags for storage in DB
$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number[] = explode(",", $dbnumber);
$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating[] = explode(",", $dbrating);
$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location[] = explode(",", $dblocation);
$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode[] = explode(",", $dblocationcode);
}
//Insert into database
$data = array();
for($i=0; $i<count($final_number); $i++)
$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
if(count($queries) != 0)
{
$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
$hostname = "%hostname%"; // eg. mysql.yourdomain.com (unique)
$username = "%username%"; // the username specified when setting-up the database
$password = "%password"; // the password specified when setting-up the database
$database = "%database"; // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");
mysql_query($query) OR die(mysql_error())
}
$final_number is something you get from a racepage link right? You are using it to as $i<count($final_number)
. Instead i think you should use $i<count($links)
there as what you want to insert is a row for each link. What you can do is move the:
$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
...line to the bottom of for($c=0; $c<count($links); $c++)
line which would make you code look like this starting from that point, (notice $data=array() is defined before the loop):
$data = array();
for($c=0; $c<count($links); $c++)
{
$racepage = file_get_contents($links[$c]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);
//strip tags for storage in DB
$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number[] = explode(",", $dbnumber);
$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating[] = explode(",", $dbrating);
$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location[] = explode(",", $dblocation);
$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode[] = explode(",", $dblocationcode);
$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[0] . "', '" . $final_rating[0] . "')";
}
if(count($queries) != 0)
{
$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
$hostname = "%hostname%"; // eg. mysql.yourdomain.com (unique)
$username = "%username%"; // the username specified when setting-up the database
$password = "%password"; // the password specified when setting-up the database
$database = "%database"; // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");
mysql_query($query) OR die(mysql_error())
}
I think there are some problems with this code still.
Edit:I also noticed that on this line
$number_data = implode(",", $number[0]);
Wouldn't $number[0]
be a string, it couldn't be an array because $number
is an array of matched strings so $number[0]
would be the whole matched string. This would apply to 'number_data', 'rating_data', 'location_data' and 'locationcode_data' so you can
$number_data = strip_tags($number[0]);
and then when creating the insert data:
$data[] = "('" . $final_location . "', '" . $final_locationcode . "', '" . $final_number . "', '" . $final_rating . "')";