robots.txt and wildcard at the end od disallow [closed]
-
19-09-2019 - |
Question
I need to disallow indexing 2 pages, one of them dynamic:
site.com/news.php
site.com/news.php?id=__
site.com/news-all.php
What should I write in robots.txt:
User-agent: *
Disallow: /news
or
Disallow: /news*
or
Disallow: /news.php*
Disallow: /news-all.php
Should one use wildcard in the end or not?
Solution
The Allow and Disallow lines in robots.txt say, "allow (or disallow) anything that starts with".
So:
Disallow: /news.php
is the same as
Disallow: /news.php*
Provided, of course, that the bot reading robots.txt understands wildcards. If the bot doesn't understand wildcards, then it will treat the asterisk as a part of the actual file name.
An asterisk at the end of the line is superfluous, and potentially hazardous.
OTHER TIPS
User-agent: *
Disallow: /news.php?id=*
User-agent: *
Disallow: /news-all.php
More info here
EDIT:
The first rule will allow news.php with parameters but allow news.php without ?id=__. If you do not want to crawl news.php that you have to use /news.php*
For sure
Disallow: /news.php
Disallow: /news-all.php
is correct. No stars are needed if you have the full filename. It is though interesting for me wheather the
Disallow: /news*
approach can work.