I have been going through different forums and was wondering if this is correct. I am trying to disable bots from crawling queries only in specific subpages (e.g. www.website.com/subpage/?query=sample). I am trying to make sure /subpage/ does not get disallowed also. Please correct me if I am wrong.

File: robots.txt

User-agent: *
Disallow: /subpage/*?
有帮助吗?

解决方案

According to what I see here, you are very close

User-agent: *
Disallow: /subpage/*?*
Allow: /subpage$

You can test this from the comfort of your own browser by using the appropriate add-on or extension.

其他提示

I do not think you can specify query string in the Disallow. The value you set for Disallow is referenced as Directory in the documentation (not as URI or URL).

You can however achieve your objective by using Sitemap.xml. You can exclude the URL from sitemap that you do not want indexed.

Google Webmaster tools also gives a some amount of granular control over how query string parameters should be interpreted. Not sure if that serves your purpose

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top