Question

For a website, my Site Map Index file and all my Site Maps are gzipped and have names like the following (SiteMapIndex.xml.gz, SiteMap1.xml.gz, SiteMap2.xml.gz), should the robots.txt file and SiteMapIndex.xml file have references to the gzipped file name or non-gzipped file name?

Example - Should robots.txt contents look like this? -

Sitemap: http://www.mysite.com/SiteMapIndex.xml.gz

or like this (without the .gz)?

Sitemap: http://www.mysite.com/SiteMapIndex.xml

Should SiteMapIndex.xml contents look like this? -

...
<sitemap>
  <loc>http://www.mysite.com/SiteMap1.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
<sitemap>
  <loc>http://www.mysite.com/SiteMap2.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
...

or this (without the .gz)? -

...
<sitemap>
  <loc>http://www.mysite.com/SiteMap1.xml</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
<sitemap>
  <loc>http://www.mysite.com/SiteMap2.xml</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
...
Was it helpful?

Solution

If you want the bot to read the .gz file, you put the .gz name in the index. That is:

<sitemap>
  <loc>http://www.mysite.com/SiteMap1.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>
<sitemap>
  <loc>http://www.mysite.com/SiteMap2.xml.gz</loc>
  <lastmod>2013-08-20</lastmod>
</sitemap>

See Using Sitemap Index Files.

The same thing goes for your robots.txt file: put the name of the gzipped file.

See Specifying the Sitemap location in your robots.txt file

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top