Domanda

I know of one method where you can do this:

$url = "http://www.google.com/search?q=test";
$str = file_get_contents($url);
preg_match("title/tt\d{7}?/", $str, $matches);
print $matches[0];

But this reads the whole file and then scans for the match.Is there anyway I can reduce the time time taken for doing the above process of matching?

È stato utile?

Soluzione

If you know where inside the webpage you need to look (i.e only the first 3000 characters or so), you can use the maxlen parameter in file_get_contents to limit the reading:

file_get_contents($url, false, NULL, -1, 3000);

UPDATE

If you don't know where to look in the webpage and you want to minimize http request length, I worked up a nice solution for you :))

$url = "www.google.com";
$step = 3000;
$found = false;

$addr = gethostbyname($url);

$client = stream_socket_client("tcp://$addr:80", $errno, $errorMessage);

if ($client === false) {
    throw new UnexpectedValueException("Failed to connect: $errorMessage");
}

fwrite($client, "GET /search?q=test HTTP/1.0\r\nHost: $url\r\nAccept: */*\r\n\r\n");

$str = "";
while(!feof($client)){
    $str .= stream_get_contents($client, $step, -1);

    if(preg_match("/tt\d{7}?/", $str, $matches)){
        $found = true;
        break;
    }
}

fclose($client);


if($found){
    echo $matches[0];
} else {
    echo "not found";
}

EXPLANATION:
set the $step variable to be the number of bytes to read each iteration, and change the "search?q=test" to your desired query (IMDB titles, judging by your regex? :) ). It will do the job wonderfully.

You can also do echo $str after the while loop to see exactly how much it has read until it found the requested string.

I believe this was what you were looking for.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top