Question

I've a case on my Hand where I must be super duper sure that google (or any yahoo / bing for that matter) does not index specific content, so the more redundant, the better.

As far as i know there are 3 Ways to accomplish that, I wonder if there are more (redundancy is key here) :

  1. set meta tag to no-index
  2. disallow affected url structure in robots.txt
  3. post load the content via ajax

So if that are all methods, good, but it would be just dandy if someone has some Idea how to be even more sure :D

(I know thats a little bit insane, but if the content shows up in google somehow it will get really expensive for my company :'-( )

Was it helpful?

Solution

uh, there are a lot more

a) identify googlebot (works similar with other bots) http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553 and don't show them the content

b) return these pages with an HTTP 404 / HTTP 410 header instead of HTTP 200

c) only show these pages to clients with cookies / sesssions

d) render the whole content as image (and then disalow the image)

e) render the whole content as an image data URL (then a disalow is not needed)

f) user pipes | in the URL structure (works in google, don't know about the other pages)

g) use dynamic URLs that only work let say for 5 minutes

and these are just a few on top of my mind ... there are propably more

OTHER TIPS

Well, I suppose you could require some sort of registration/authentication to see the content.

We're using the post-load content via ajax method at my work and it works pretty well. You just have to be sure that you're not returning anything if that same ajax route is hit without the xhr header. (We're using it in conjunction with authorization though.)

I just don't think there's anyway to be completely sure without actually locking down the data behind some sort of authentication. And if it's going to be expensive for your company if it gets out there, then you might want to seriously consider it.

What about blocking IPs from search engines and requests with search engine user-agents in .htaccess?

It might need more maintenance of the list of IPs and user-agents but it will work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top