문제

I'm going to block all bots except the big search engines. One of my blocking methods will be to check for "language": Accept-Language: If it has no Accept-Language the bot's IP address will be blocked until 2037. Googlebot does not have Accept-Language, I want to verify it with DNS lookup

<?php
gethostbyaddr($_SERVER['REMOTE_ADDR']);
?>

Is it ok to use gethostbyaddr, can someone pass my "gethostbyaddr protection"?

도움이 되었습니까?

다른 팁

//The function
function is_google() {
    return strpos($_SERVER['HTTP_USER_AGENT'],"Googlebot");
}

The recommended way by Google is to do a reverse dns lookup (gethostbyaddr) in order to get the associated host name AND then resolve that name to an IP (gethostbyname) and compare it to the remote_addr (because reverse lookups can be faked, too).

But beware, end lokups take time and can severely slow down your webpage (maybe check for user agent first).

See https://webmasters.googleblog.com/2006/09/how-to-verify-googlebot.html

In addition to Cristian's answer:

function is_valid_google_ip($ip) {

    $hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com"

    return preg_match('/\.googlebot|google\.com$/i', $hostname);
}

function is_valid_google_request($ip=null,$agent=null){

    if(is_null($ip)){

        $ip=$_SERVER['REMOTE_ADDR'];
    }

    if(is_null($agent)){

        $agent=$_SERVER['HTTP_USER_AGENT'];
    }

    $is_valid_request=false;

    if (strpos($agent, 'Google')!==false && is_valid_google_ip($ip)){

        $is_valid_request=true;
    }

    return $is_valid_request;
}
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top