Googlebot不要尊重Robots.txt的[关闭]

https://stackoverflow.com/questions/463569

19-08-2019
|

题

出于某种原因，当我检查的谷歌网站管理员工具的“分析robots.txt”看到我们的robots.txt文件被封锁的网址，这不是我期待的。这是从我们的文件的开头片段：

Sitemap: http://[omitted]/sitemap_index.xml

User-agent: Mediapartners-Google
Disallow: /scripts

User-agent: *
Disallow: /scripts
# list of articles given by the Content group
Disallow: http://[omitted]/Living/books/book-review-not-stupid.aspx
Disallow: http://[omitted]/Living/books/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx
Disallow: http://[omitted]/Living/sportsandrecreation/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx

在脚本文件夹任何正确都禁止Googlebot和了MediaPartners-谷歌。我可以看到，这两个机器人看到的是正确指令，因为Googlebot的说，脚本是从7号线受阻而了MediaPartners，谷歌从第4行受阻，但任何其他网址我把从不允许的网址，第二个用户下-agent指令未被阻止！

我想知道如果我的评论或使用绝对路径被拧的事情了......

任何了解表示赞赏。感谢。

解决方案

为什么它们被忽略的原因是，你必须在robots.txt条目而的Disallow文件的完全限定网址“>规范不允许这样做。（您应该只指定相对路径或绝对路径中使用/）。尝试以下方法：

Sitemap: /sitemap_index.xml

User-agent: Mediapartners-Google
Disallow: /scripts

User-agent: *
Disallow: /scripts
# list of articles given by the Content group
Disallow: /Living/books/book-review-not-stupid.aspx
Disallow: /Living/books/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx
Disallow: /Living/sportsandrecreation/book-review-running-through-roadblocks-inspirational-stories-of-twenty-courageous-athletic-warriors.aspx

作为用于高速缓存，谷歌试图在平均以获得robots.txt文件的副本每24小时。

其他提示

这是绝对URL。 robots.txt是否只应该包括相对URI;域是基于对robots.txt从访问的域的推断。

它已经达到了至少一个星期，谷歌表示，在上次下载4小时前，所以我敢肯定，这是最近的事。

你最近做此更改您的robots.txt文件？在我的经验，它的似乎的，谷歌缓存的东西了很长一段时间。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow