سؤال

I know of one method where you can do this:

$url = "http://www.google.com/search?q=test";
$str = file_get_contents($url);
preg_match("title/tt\d{7}?/", $str, $matches);
print $matches[0];

But this reads the whole file and then scans for the match.Is there anyway I can reduce the time time taken for doing the above process of matching?

هل كانت مفيدة؟

المحلول

If you know where inside the webpage you need to look (i.e only the first 3000 characters or so), you can use the maxlen parameter in file_get_contents to limit the reading:

file_get_contents($url, false, NULL, -1, 3000);

UPDATE

If you don't know where to look in the webpage and you want to minimize http request length, I worked up a nice solution for you :))

$url = "www.google.com";
$step = 3000;
$found = false;

$addr = gethostbyname($url);

$client = stream_socket_client("tcp://$addr:80", $errno, $errorMessage);

if ($client === false) {
    throw new UnexpectedValueException("Failed to connect: $errorMessage");
}

fwrite($client, "GET /search?q=test HTTP/1.0\r\nHost: $url\r\nAccept: */*\r\n\r\n");

$str = "";
while(!feof($client)){
    $str .= stream_get_contents($client, $step, -1);

    if(preg_match("/tt\d{7}?/", $str, $matches)){
        $found = true;
        break;
    }
}

fclose($client);


if($found){
    echo $matches[0];
} else {
    echo "not found";
}

EXPLANATION:
set the $step variable to be the number of bytes to read each iteration, and change the "search?q=test" to your desired query (IMDB titles, judging by your regex? :) ). It will do the job wonderfully.

You can also do echo $str after the while loop to see exactly how much it has read until it found the requested string.

I believe this was what you were looking for.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top