robots.txt: Disallow bots to access a given “url depth”
-
22-08-2019 - |
Question
I have links with this structure:
- http://www.example.com/tags/blah
- http://www.example.com/tags/blubb
- http://www.example.com/tags/blah/blubb (for all items that match BOTH tags)
I want google & co to spider all links that have ONE tag in the URL, but NOT the URLs that have two or more tags.
Currently I use the html meta tag "robots" -> "noindex, nofollow" to solve the problem.
Is there a robots.txt solution (that works at least for some search bots) or do I need to continue with "noindex, nofollow" and live with the additional traffic?
Solution
I don't think you can do it using robots.txt. The standard is pretty narrow (no wildcards, must be at the top level, etc.).
What about disallowing them based on user-agent in your server?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow