Pregunta

This bot doesn't respect nofollow noindex in robots.txt.

I have this in robots.txt:

User-agent: Msnbot
Disallow: /

User-Agent: Msnbot/2.0b
Disallow: /

Till now it was pretty slow, but now, it is a monster that won't leave my site at all. Crawls all WordPress and MyBB 24/7.

To block IP ranges or what can I do to stop all of this content stealers?

¿Fue útil?

Solución

Based on Block by useragent or empty referer you could something like this in your .htaccess

Options +FollowSymlinks  
RewriteEngine On  
RewriteBase /  
SetEnvIfNoCase User-Agent "^Msnbot" ban_agent
Deny from env=ban_agent

Otros consejos

Here's what you need to do instead:

Code:

User-agent: *
Disallow:

User-agent: MSNbot
Disallow: /

The above code allows all robots except MSNbot.

You can read more about the robots exclusion protocol here.

for example, for bing.

User-agent: MSNBot
Disallow: /

for google

User-agent: googlebot

Disallow: /

if you want block all bots. use this.

User-agent: *

Disallow: /

Though I was unable to identify specific bots that visit my site and spend 0:00 time per page, I was able to identify the countries where these attacks are coming from.

enter image description here

Since the attacks are mostly only coming from China and the US, I'm going to block those countries completely from visiting my website using my htaccess file. I hope it works.

I only recommend this if you know you only want traffic from your country and nowhere else, and you're sure you're not losing traffic that you want to get from countries you want to ban.

Here are the links to the tutorial:

https://www.hostinger.com/tutorials/htaccess/how-to-allow-or-block-visitors-from-specific-countries-using-htaccess

https://www.countryipblocks.net/acl.php

I just implemented this now, I hope it works for me. It seems like a good solution for me because my Canadian traffic is good while the US and China traffic all seem to be attacks only.

Again, I recommend discretion when using a solution like this.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top