Question

Help Help! Google indexed a test folder on my website which no one save I was supposed to know about :(! How do I restrict google from indexing links and certain folders.

Was it helpful?

Solution

Use a robot exclusion file, or better yet password protect your test areas! Using a robots.txt file to "protect" areas you don't want others to see is a little like hanging a sign on your back door saying "I've left this open but please don't come in" :)

If you sign up for Google webmaster tools, you can request removal of a search result if you ensure it's no longer accessible by their crawler.

OTHER TIPS

The best way to avoid crawlers to index some of your content is by the robots.txt file on the root of your site.

Here is an example:

User-agent: *
Allow: /
Crawl-delay: 5

User-agent: *
Disallow: /cgi-bin
Disallow: /css
Disallow: /img
Disallow: /js

On the first block I'm telling the crawler he can browse all.

The second block has the list of folders I want him to avoid.

This is not a safe way of really protect it, since some crawlers do not respect it.

If you really want to protect it, the best way should be to have a .htaccess file on those folders to force authentication.

Beware! You can tell "nice" bots (like google) to stay away from certain places, but other bots don't play that nice. So the only way to solve this properly is to add some restrictions to the places that are not considered "public". You could restrict access to some IP addresses you trust, or you could add username/password authentication.

Maybe the right answer is to not put test code on a public web site. Why is it part of your deployment at all?

If you're using Apache:

.htaccess

AuthUserFile //.htpasswd AuthGroupFile /dev/null AuthName "You must log in to access this development web site" AuthType Basic

<Limit GET>

require valid-user

</Limit>

The password file (.htpasswd) then contains

name:password

The password is encrypted. If you search "htpasswd" you'll find plenty of free programmes to encrypt the password for you.

TRiG.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top