If you know where inside the webpage you need to look (i.e only the first 3000 characters or so), you can use the maxlen
parameter in file_get_contents
to limit the reading:
file_get_contents($url, false, NULL, -1, 3000);
UPDATE
If you don't know where to look in the webpage and you want to minimize http request length, I worked up a nice solution for you :))
$url = "www.google.com";
$step = 3000;
$found = false;
$addr = gethostbyname($url);
$client = stream_socket_client("tcp://$addr:80", $errno, $errorMessage);
if ($client === false) {
throw new UnexpectedValueException("Failed to connect: $errorMessage");
}
fwrite($client, "GET /search?q=test HTTP/1.0\r\nHost: $url\r\nAccept: */*\r\n\r\n");
$str = "";
while(!feof($client)){
$str .= stream_get_contents($client, $step, -1);
if(preg_match("/tt\d{7}?/", $str, $matches)){
$found = true;
break;
}
}
fclose($client);
if($found){
echo $matches[0];
} else {
echo "not found";
}
EXPLANATION:
set the $step
variable to be the number of bytes to read each iteration, and change the "search?q=test"
to your desired query (IMDB titles, judging by your regex? :) ). It will do the job wonderfully.
You can also do echo $str
after the while
loop to see exactly how much it has read until it found the requested string.
I believe this was what you were looking for.