Verifying a Googlebot
-
28-09-2019 - |
문제
I'm going to block all bots except the big search engines. One of my blocking methods will be to check for "language": Accept-Language: If it has no Accept-Language the bot's IP address will be blocked until 2037. Googlebot does not have Accept-Language, I want to verify it with DNS lookup
<?php
gethostbyaddr($_SERVER['REMOTE_ADDR']);
?>
Is it ok to use gethostbyaddr
, can someone pass my "gethostbyaddr protection"?
다른 팁
//The function
function is_google() {
return strpos($_SERVER['HTTP_USER_AGENT'],"Googlebot");
}
The recommended way by Google is to do a reverse dns lookup (gethostbyaddr) in order to get the associated host name AND then resolve that name to an IP (gethostbyname) and compare it to the remote_addr (because reverse lookups can be faked, too).
But beware, end lokups take time and can severely slow down your webpage (maybe check for user agent first).
See https://webmasters.googleblog.com/2006/09/how-to-verify-googlebot.html
In addition to Cristian's answer:
function is_valid_google_ip($ip) {
$hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com"
return preg_match('/\.googlebot|google\.com$/i', $hostname);
}
function is_valid_google_request($ip=null,$agent=null){
if(is_null($ip)){
$ip=$_SERVER['REMOTE_ADDR'];
}
if(is_null($agent)){
$agent=$_SERVER['HTTP_USER_AGENT'];
}
$is_valid_request=false;
if (strpos($agent, 'Google')!==false && is_valid_google_ip($ip)){
$is_valid_request=true;
}
return $is_valid_request;
}