Question

I am using Codeigniter to build my website. Many of my URLs contain multiple slashes and look something like this:

http://www.example.com/user/friend/mack/johnson/1023

My problem is that Google wants to index several different variations of this URL like:

http://www.example.com/user/friend/mack/johnson/
http://www.example.com/user/friend/mack/
http://www.example.com/user/friend/
http://www.example.com/user/

Unfortunately, these URLs don't go anywhere. Is there a way to disallow the sub-folders of an allowed folder within a robots.txt file? I'm guessing it would look something like this:

Disallow: /user/*
Disallow: /user/*/*
Disallow: /user/*/*/*
Allow: /user/*/*/*/*

I am a bit afraid to try this. I really have no idea how the search engines will react.

About the answer:

It seems that the most specific rule is the one that is followed. therefor:

allow: /item/results/product/*/*/
allow: /item/results/product/*/*/*/$
disallow: /item/results/product/*/*/$
disallow: /item/results/product/*/*/*/*

will allow

/item/results/product/some/thing/12345

and

/item/results/product/some/thing/12345/

but not

/item/results/product/some/thing/

nor

/item/results/product/some/thing/12345/a
Was it helpful?

Solution

First, a trailing * is superfluous. So /user/ and /user/* mean the same thing.

You should be able to write:

Allow: /user/*/*/*/
Disallow: /user/

This says to allow crawling of the subdirectories three levels down, but disallow anything else that begins with /user/.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top