Would having the following robot.txt work?

User-agent: *
Disallow: /

User-agent: Googlebot-Image
Allow: /

My idea is to avoid google crawling my cdn domain but allowing google image still crawl and index my images.

有帮助吗?

解决方案

The file has to be called robots.txt, not robot.txt.

Note that User-agent: * targets all bots (that are not matched by another User-agent record), not only the Googlebot. So if you want allow other bots to crawl your site, you would want to use User-agent: Googlebot instead.

So this robots.txt would allow "Googlebot-Image" everything, and disallow everything for all other bots:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /

(Note that Disallow: with an empty string value is equivalent to Allow: /, but the Allow field is not part of the original robots.txt specification, although some parsers support it, among them Google’s).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top