robots.txt blocking crawlers from accesing page [closed]

https://stackoverflow.com/questions/21498851

05-10-2022
|

题

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.

Closed 8 years ago.

Improve this question

I try to find how to block crawlers to access my links that are something like this:

site.com/something-search.html

I want to block all /something-*

Can someone help me?

解决方案

User-agent: *
Disallow: /something-

This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:

http://example.com/something-
http://example.com/something-foo
http://example.com/something-foo.html
http://example.com/something-foo/bar
…

The following URLs would still be allowed:

http://example.com/something
http://example.com/something.html
http://example.com/something/
…

其他提示

In your robots.txt

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

Add entry for each page that you don't want to be seen!

Though regex are not allowd in robots.txt some intelligent crawlers can understand it!

have a look here

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow