Question

I have been investigate a bit. Will code below work? Not so easy to check.

RewriteEngine on
HostnameLookups Double  
RewriteCond %{REMOTE_HOST} (\.googlebot\.com) [NC] 
RewriteRule ^(.*)$ /do-something [L,R]

I worry the most for part

HostnameLookups Double 

It says in some place that works only in httpd.confg, vps, directory(not shure what this last means if not .htaccess but not saying in htaccess). Do you have knowledge about this issue?

Was it helpful?

Solution 2

You can use a condition with %{HTTP_USER_AGENT} variable:

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} ^googlebot
RewriteRule ^(.*)$ /do-something [L,R]

Though keep in mind that %{HTTP_USER_AGENT} can be spoofed.

OTHER TIPS

In .htaccess:

Order Allow, Deny

Allow from googlebot.com
Allow from search.msn.com
Allow from crawl.yahoo.net
Allow from baidu.com
Allow from yandex.ru
Allow from yandex.net
Allow from yandex.com

Maybe some other search engines would also be a good idea?

From Apace docs: http://httpd.apache.org/docs/2.2/mod/mod_authz_host.html#allow

...It will do a reverse DNS lookup on the IP address to find the associated hostname, and then do a forward lookup on the hostname to assure that it matches the original IP address. Only if the forward and reverse DNS are consistent and the hostname matches will access be allowed.

# Validate Googlebots
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Googlebot/2\.[01];\ \+http://www\.google\.com/bot\.html\)$
RewriteCond %{HTTP:Accept} ^\*/\*$
RewriteCond %{HTTP:Accept-Encoding} ="gzip,deflate"
RewriteCond %{HTTP:Accept-Language} =""
RewriteCond %{HTTP:Accept-Charset} =""
RewriteCond %{HTTP:From} ="googlebot(at)googlebot.com"
RewriteCond %{REMOTE_ADDR} ^66\.249\.(6[4-9]|7[0-9]|8[0-46-9]|9[0-5])\. [OR]
RewriteCond %{REMOTE_ADDR} ^216\.239\.(3[2-9]|[45][0-9]|6[0-3])\.0
# Optional reverse-DNS-lookup replacement for IP-address check lines above
# RewriteCond %{REMOTE_HOST} ^crawl(-([1-9][0-9]?|1[0-9]{2}|2[0-4][0-9]|25[0-5])){4}\.googlebot\.com$
RewriteRule ^ - [S=1]
# Block invalid Googlebots
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteRule ^ - [F]

Note that the optional reverse-DNS line will only work on servers which allow the use of reverse-DNS lookups.

Further, once this rDNS lookup is triggered, the format of your access log file will change; It will no longer show IP addresses as the first entry on each line, but will instead show remote hostnames. This can greatly affect your server administration process, and may cause some 'stats' programs to stop correctly reporting server access summaries. Once your server gets into this mode, it will remain that way until it is re-started.

If you have server configuration privileges, you can easily change your log file format so that it displays Remote_Addr instead of Remote_Host as the first entry on each line, regardless of whether rDNS is enabled by changing the first token in the logging format from %h to %a. See Apache mod_log_config

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top