I try to find how to block crawlers to access my links that are something like this:

site.com/something-search.html

I want to block all /something-*

Can someone help me?

有帮助吗?

解决方案

User-agent: *
Disallow: /something-

This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:

  • http://example.com/something-
  • http://example.com/something-foo
  • http://example.com/something-foo.html
  • http://example.com/something-foo/bar

The following URLs would still be allowed:

  • http://example.com/something
  • http://example.com/something.html
  • http://example.com/something/

其他提示

In your robots.txt

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

Add entry for each page that you don't want to be seen!

Though regex are not allowd in robots.txt some intelligent crawlers can understand it!

have a look here

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top