Вопрос

I'm trying to get a simple string with the description of the image I searched with search-by-image. So I set up my search_by_google.php page:

    <?php
$url = $_REQUEST['url'];

if(empty($_REQUEST['raw'])){
$raw = false;
}
else{
$raw = true;
}
echo fetch_google($url, $raw);

function fetch_google($u, $raw, $terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
{

    $ch = curl_init();
    $url = 'http://www.google.com/imghp?hl=en&tab=wi';
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
    curl_setopt ($ch, CURLOPT_TIMEOUT,120);
    curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_exec($ch);

$searched="";
for($i=0;$i<=$numpages;$i++)
{
    $ch = curl_init();
    $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($u);
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/imghp?hl=en&tab=wi');
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
    curl_setopt ($ch, CURLOPT_TIMEOUT,120);
    curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"cookie.txt");
    $searched=$searched.curl_exec ($ch);
    curl_close ($ch);
}
if($raw){
        return $searched;
    }
    else{
        $matches = array();
        preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
        return (count($matches) > 1 ? $matches[1] : false);
    }
 }
  ?>

I've changed all the curl options but if I go to http://www.mysite.altervista.org/search_by_google.php?url=http://www.mysite.org/asdasd.jpg&raw=false

It keep me saying 302 Moved

enter image description here

I have changed my code putting

    curl_setopt ($ch, CURLOPT_HEADER, TRUE);

in the second curl_init() and now it gives me this message:

enter image description here

EDIT 25/03/2014 19:34

I changed my code like Sabuj Hassan said and the log now is:

HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved

The document has moved here. HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved

The document has moved here.
Это было полезно?

Решение

It can happen that following redirection is blocked for your curl at your server. So I'll recommend you to handle the redirection manually. Like this one:

First your curl function. You can add other curl options if you like:

function curl($url, $user_agent, $retry=0){
    if($retry > 5){
        print "Maximum 5 retries are done, skipping!\n";
        return "in loop!";
    }

    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $result = curl_exec($ch);
    curl_close($ch);

    // handling the follow redirect
    if(preg_match("|Location: (https?://\S+)|", $result, $m)){
        print "Manually doing follow redirect!\n$m[1]\n";
        return curl($m[1], $user_agent, $retry + 1);
    }

    // add another condition here if the location is like Location: /home/products/index.php

    return $result;
}

And here is how it should be called:

$response = curl("http://www.google.com/", "Mozilla 5.0");
print "$response\n";

I am parsing the follow link from the Location: header. It can happen that the link is not started with http:// That case add another condition over there.

Другие советы

The problem is probably with CURLOPT_FOLLOWLOCATION which is unavailable if safe_mode or open_basedir are enabled in php.ini

Try this answer: http://au.php.net/manual/ro/function.curl-setopt.php#71313

Just replace curl_exec() with curl_redir_exec() provided in the comment.

Check if you didn't hit max request per IP. You may be redirected to captcha page. AFAIK it is against google rules to use a robot to query google.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top