Question

I'm trying to build a FaH stats scraper. Every hour the newly update stats list is pulled to my server via cron and wget into this file http://chrislabs.info/statsFile.txt. This is opened in the script as $page using file_get_contents.

Then, for a list of unique team numbers (the fourth column), I'm trying to regex all the rows containing that team number using the code below:

foreach($teamArr as $team){
    $pattern = "/(.*[ascii])\t([0-9]*)\t.*[0-9]\t$team/";
    preg_match_all($pattern, $page, $matches);
    echo "<pre>";
    print_r($matches);
    echo "</pre>";

However, this isn't finding all the matches in $page and I'm at a loss now as to what to fix. I've changed the pcre.* INI settings to go up to 1GB.

You can look at the output here http://chrislabs.info/FoldingStats_MYSQL.php

Was it helpful?

Solution

Try to use this:

$pattern = '~^(?:\S++\t){3}' . $team . '$~m';

Your pattern didn't find all matches because you added [ascii] that only matches the letter a or the letter s or the letter c or the letter i. If you want to match all letters you must use [a-z]

An other way is to use fgetcsv and remove all records that are not from the team you are looking for.

You can also use two explode() the first with \n and the second with \t and check $item[3] for your team.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top