Pergunta

I try to find how to block crawlers to access my links that are something like this:

site.com/something-search.html

I want to block all /something-*

Can someone help me?

Foi útil?

Solução

User-agent: *
Disallow: /something-

This blocks all URLs whose path starts with /something-, for example for a robots.txt accessible from http://example.com/robots.txt:

  • http://example.com/something-
  • http://example.com/something-foo
  • http://example.com/something-foo.html
  • http://example.com/something-foo/bar

The following URLs would still be allowed:

  • http://example.com/something
  • http://example.com/something.html
  • http://example.com/something/

Outras dicas

In your robots.txt

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

Add entry for each page that you don't want to be seen!

Though regex are not allowd in robots.txt some intelligent crawlers can understand it!

have a look here

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top