Domanda

I'm trying to get a simple string with the description of the image I searched with search-by-image. So I set up my search_by_google.php page:

    <?php
$url = $_REQUEST['url'];

if(empty($_REQUEST['raw'])){
$raw = false;
}
else{
$raw = true;
}
echo fetch_google($url, $raw);

function fetch_google($u, $raw, $terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
{

    $ch = curl_init();
    $url = 'http://www.google.com/imghp?hl=en&tab=wi';
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
    curl_setopt ($ch, CURLOPT_TIMEOUT,120);
    curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_exec($ch);

$searched="";
for($i=0;$i<=$numpages;$i++)
{
    $ch = curl_init();
    $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($u);
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_VERBOSE,true);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/imghp?hl=en&tab=wi');
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
    curl_setopt ($ch, CURLOPT_TIMEOUT,120);
    curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"cookie.txt");
    $searched=$searched.curl_exec ($ch);
    curl_close ($ch);
}
if($raw){
        return $searched;
    }
    else{
        $matches = array();
        preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
        return (count($matches) > 1 ? $matches[1] : false);
    }
 }
  ?>

I've changed all the curl options but if I go to http://www.mysite.altervista.org/search_by_google.php?url=http://www.mysite.org/asdasd.jpg&raw=false

It keep me saying 302 Moved

enter image description here

I have changed my code putting

    curl_setopt ($ch, CURLOPT_HEADER, TRUE);

in the second curl_init() and now it gives me this message:

enter image description here

EDIT 25/03/2014 19:34

I changed my code like Sabuj Hassan said and the log now is:

HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved

The document has moved here. HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved

The document has moved here.
È stato utile?

Soluzione

It can happen that following redirection is blocked for your curl at your server. So I'll recommend you to handle the redirection manually. Like this one:

First your curl function. You can add other curl options if you like:

function curl($url, $user_agent, $retry=0){
    if($retry > 5){
        print "Maximum 5 retries are done, skipping!\n";
        return "in loop!";
    }

    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt ($ch, CURLOPT_HEADER, TRUE);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
    curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
    curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $result = curl_exec($ch);
    curl_close($ch);

    // handling the follow redirect
    if(preg_match("|Location: (https?://\S+)|", $result, $m)){
        print "Manually doing follow redirect!\n$m[1]\n";
        return curl($m[1], $user_agent, $retry + 1);
    }

    // add another condition here if the location is like Location: /home/products/index.php

    return $result;
}

And here is how it should be called:

$response = curl("http://www.google.com/", "Mozilla 5.0");
print "$response\n";

I am parsing the follow link from the Location: header. It can happen that the link is not started with http:// That case add another condition over there.

Altri suggerimenti

The problem is probably with CURLOPT_FOLLOWLOCATION which is unavailable if safe_mode or open_basedir are enabled in php.ini

Try this answer: http://au.php.net/manual/ro/function.curl-setopt.php#71313

Just replace curl_exec() with curl_redir_exec() provided in the comment.

Check if you didn't hit max request per IP. You may be redirected to captcha page. AFAIK it is against google rules to use a robot to query google.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top