robots.txt: Disallow bots to access a given “url depth”

https://stackoverflow.com/questions/682863

robots.txt
noindex

22-08-2019
|

Question

I have links with this structure:

http://www.example.com/tags/blah
http://www.example.com/tags/blubb
http://www.example.com/tags/blah/blubb (for all items that match BOTH tags)

I want google & co to spider all links that have ONE tag in the URL, but NOT the URLs that have two or more tags.

Currently I use the html meta tag "robots" -> "noindex, nofollow" to solve the problem.

Is there a robots.txt solution (that works at least for some search bots) or do I need to continue with "noindex, nofollow" and live with the additional traffic?

Solution

I don't think you can do it using robots.txt. The standard is pretty narrow (no wildcards, must be at the top level, etc.).

What about disallowing them based on user-agent in your server?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow