Question

I have written a page that will scan a site and then extract certain code from the source. That part is working successfully, however I want to run this over multiple pages and dump the details into a database. I am stuggling to get the loop working, this is what I currently have:

date_default_timezone_set("australia/sydney");

$host = 'http://www.tabonline.com.au/';
$day = date(d);
$month = date(m);
$year = date(Y);
$slash = '/';
$mtgraces = '/mtgraces.html';

//Gallops Meetings on Todays racing page
$content = file_get_contents($host . $year . "/". $month . "/" . $day . $mtgraces);
preg_match_all('#<a[^<>]+href\s*=\s*[\'"](.R[0-9]+.html*)[\'"]#i', $content, $matches);
foreach ($matches[1] as $url) $links[] =  "$host$year$slash$month$slash$day$slash$url";

//get the runners from each page

for($c=0; $c<count($links); $c++)

$racepage = file_get_contents($links[$i]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);

//strip tags for storage in DB

$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number = explode(",", $dbnumber);

$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating = explode(",", $dbrating);

$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location = explode(",", $dblocation);

$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode = explode(",", $dblocationcode);

//Insert into database

 $data = array(); 
for($i=0; $i<count($final_number); $i++)
{
    $data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
}

if(count($queries) == 0)
{
    # Nothing passed
    # exit
}


$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data); 


$hostname = "%hostname%";   // eg. mysql.yourdomain.com (unique)
$username = "%username%";   // the username specified when setting-up the database
$password = "%password";   // the password specified when setting-up the database
$database = "%database";   // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");

mysql_query($query) OR die(mysql_error())

At the moment the output for this is giving me the correct contents of the last page in the list of sites (the $links variable). Ultimately I want it to loop through the whole $links variable and then import that data, using the $query variable, into a database so I can do further analysis on it.

I hope this makes sense and you can see the error in my ways.

Was it helpful?

Solution 3

I have managed to figure it out!

I needed to put the whole lot in the for loop, so it looks like this:

for($c=0; $c<count($links); $c++)
    {
$racepage = file_get_contents($links[$c]);
preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);

//strip tags for storage in DB

$number_data = implode(",", $number[0]);
$dbnumber = strip_tags($number_data);
$final_number = explode(",", $dbnumber);

$rating_data = implode(",", $rating[0]);
$dbrating = strip_tags($rating_data);
$final_rating = explode(",", $dbrating);

$location_data = implode(",", $location[0]);
$dblocation = strip_tags($location_data);
$final_location = explode(",", $dblocation);

$locationcode_data = implode(",", $locationcode[0]);
$dblocationcode = strip_tags($locationcode_data);
$final_locationcode = explode(",", $dblocationcode);

//Insert into database

 $data = array(); 
for($i=0; $i<count($final_number); $i++)
{
    $data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";
}

if(count($queries) == 0)
{
    # Nothing passed
    # exit
}


$query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);


$hostname = "%HOSTNAME";   // eg. mysql.yourdomain.com (unique)
$username = "%username%";   // the username specified when setting-up the database
$password = "%password%";   // the password specified when setting-up the database
$database = "%database%";   // the database name chosen when setting-up the database (unique)
mysql_connect($hostname,$username,$password);
mysql_select_db($database) or die("Unable to select database");

mysql_query($query) OR die(mysql_error());


}

Thank you all for your help, it seems like a great community that is here. I am sure to keep an eye on it for more fixes.

OTHER TIPS

Hmm... There are a few issues in here...

for($c=0; $c<count($links); $c++)

This loop is executing just the next line:

$racepage = file_get_contents($links[$i]);

However, $i isn't defined, I suspect you want $c. Also, you need to place some braces around various parts... Now, this is untested, but I think you want something like:

date_default_timezone_set("australia/sydney");


$host = 'http://www.tabonline.com.au/';
$day = date(d);
$month = date(m);
$year = date(Y);
$slash = '/';
$mtgraces = '/mtgraces.html';


//Gallops Meetings on Todays racing page
$content = file_get_contents($host . $year . "/". $month . "/" . $day . $mtgraces);
preg_match_all('#<a[^<>]+href\s*=\s*[\'"](.R[0-9]+.html*)[\'"]#i', $content, $matches);
foreach ($matches[1] as $url) $links[] =  "$host$year$slash$month$slash$day$slash$url";


//get the runners from each page
$final_number = array();
$final_rating = array();
$final_location = array();
$final_locationcode = array();

for($c=0; $c<count($links); $c++)
{
  $racepage = file_get_contents($links[$c]);
  preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
  preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
  preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
  preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);

  //strip tags for storage in DB
  $number_data = implode(",", $number[0]);
  $dbnumber = strip_tags($number_data);
  $final_number[] = explode(",", $dbnumber);

  $rating_data = implode(",", $rating[0]);
  $dbrating = strip_tags($rating_data);
  $final_rating[] = explode(",", $dbrating);

  $location_data = implode(",", $location[0]);
  $dblocation = strip_tags($location_data);
  $final_location[] = explode(",", $dblocation);

  $locationcode_data = implode(",", $locationcode[0]);
  $dblocationcode = strip_tags($locationcode_data);
  $final_locationcode[] = explode(",", $dblocationcode);
}

//Insert into database
$data = array();
for($i=0; $i<count($final_number); $i++)
    $data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";


if(count($queries) != 0)
{
  $query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
  $hostname = "%hostname%";   // eg. mysql.yourdomain.com (unique)
  $username = "%username%";   // the username specified when setting-up the database
  $password = "%password";   // the password specified when setting-up the database
  $database = "%database";   // the database name chosen when setting-up the database (unique)
  mysql_connect($hostname,$username,$password);
  mysql_select_db($database) or die("Unable to select database");
  mysql_query($query) OR die(mysql_error())
}

$final_number is something you get from a racepage link right? You are using it to as $i<count($final_number). Instead i think you should use $i<count($links) there as what you want to insert is a row for each link. What you can do is move the:

$data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[$i] . "', '" . $final_rating[$i] . "')";

...line to the bottom of for($c=0; $c<count($links); $c++) line which would make you code look like this starting from that point, (notice $data=array() is defined before the loop):

$data = array();
for($c=0; $c<count($links); $c++)
{
  $racepage = file_get_contents($links[$c]);
  preg_match_all('#<td align="right" height="18"><font color="\#ffffff">[0-9]{1,2}</font></td>#', $racepage, $number);
  preg_match_all('#<font color="\#00ffff">[0-9]{1,3}</font>#', $racepage, $rating);
  preg_match_all('#<B>[\w]+([\s][A-Z]+)?</B>#', $racepage, $location);
  preg_match_all('#<B>[\w]+\s[0-9]+</B>#', $racepage, $locationcode);

  //strip tags for storage in DB
  $number_data = implode(",", $number[0]);
  $dbnumber = strip_tags($number_data);
  $final_number[] = explode(",", $dbnumber);

  $rating_data = implode(",", $rating[0]);
  $dbrating = strip_tags($rating_data);
  $final_rating[] = explode(",", $dbrating);

  $location_data = implode(",", $location[0]);
  $dblocation = strip_tags($location_data);
  $final_location[] = explode(",", $dblocation);

  $locationcode_data = implode(",", $locationcode[0]);
  $dblocationcode = strip_tags($locationcode_data);
  $final_locationcode[] = explode(",", $dblocationcode);

  $data[] = "('" . $final_location[0] . "', '" . $final_locationcode[0] . "', '" . $final_number[0] . "', '" . $final_rating[0] . "')";
}

if(count($queries) != 0)
{
  $query = "insert into ratings(location, location_code, tab_no, rating) values " . implode(", ", $data);
  $hostname = "%hostname%";   // eg. mysql.yourdomain.com (unique)
  $username = "%username%";   // the username specified when setting-up the database
  $password = "%password";   // the password specified when setting-up the database
  $database = "%database";   // the database name chosen when setting-up the database (unique)
  mysql_connect($hostname,$username,$password);
  mysql_select_db($database) or die("Unable to select database");
  mysql_query($query) OR die(mysql_error())
}

I think there are some problems with this code still.
Edit:I also noticed that on this line

$number_data = implode(",", $number[0]);

Wouldn't $number[0] be a string, it couldn't be an array because $number is an array of matched strings so $number[0] would be the whole matched string. This would apply to 'number_data', 'rating_data', 'location_data' and 'locationcode_data' so you can

$number_data = strip_tags($number[0]);

and then when creating the insert data:

$data[] = "('" . $final_location . "', '" . $final_locationcode . "', '" . $final_number . "', '" . $final_rating . "')";
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top