Getting Google Sitemap Generator working: "[ERROR] When attempting to access your generated Sitemap ... we failed to read it. "

StackOverflow https://stackoverflow.com/questions/15905665

  •  03-04-2022
  •  | 
  •  

Domanda

I'm trying to get Google Sitemap Generator working.

Here is my (Zend Framework 2) project structure:

/
/...
/public/...
/public/sitemap.xml
/public/urllist.txt
/...
/temp/googlesitemapgen/
/temp/googlesitemapgen/config.xml
/temp/googlesitemapgen/sitemap_gen.py
/...

config.xml

<?xml version="1.0" encoding="UTF-8" ?>
<site
    base_url="http://foo.bar.loc"
    store_into="/var/www/bar/foo/public/sitemap.xml"
    verbose="3"
    suppress_search_engine_notify="0"
>
    <urllist path="/var/www/bar/foo/public/urllist.txt" encoding="UTF-8" />
</site>

urllist.txt

http://foo.bar.loc

When I call the generating script

user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xmlthon sitemap_gen.py --config=config.xml

An error occurs:

user@machine:/var/www/bar/foo/temp/googlesitemapgen# python sitemap_gen.py --config=config.xml 
sitemap_gen.py:65: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
Reading configuration file: config.xml
BaseURL is set to: http://foo.bar.loc/
Input: From URLLIST "/var/www/bar/foo/public/urllist.txt"
Opened URLLIST file: /var/www/bar/foo/public/urllist.txt
[WARNING] Discarded URL for not starting with the base_url: http://foo.bar.loc
[WARNING] No URLs were recorded, writing an empty sitemap.
Sorting and normalizing collected URLs.
Writing Sitemap file "/var/www/bar/foo/public/sitemap.xml" with 0 URLs
Notifying search engines.
[ERROR] When attempting to access our generated Sitemap at the following URL:
    http://foo.bar.loc/sitemap.xml
  we failed to read it.  Please verify the store_into path you specified in
  your configuration file is web-accessable.  Consult the FAQ for more
  information.
[WARNING] Proceeding to notify with an unverifyable URL.
Notifying: www.google.com
Notification URL: http://www.google.com/webmasters/sitemaps/ping?sitemap=http%3A%2F%2Ffoo.bar.loc%2Fsitemap.xml
Number of errors: 1
Number of warnings: 3

This error is described in the "Troubleshooting" section of the docu. But I've already checked the base_url and store_into -- both is set correctly.

Why is this error occuring now? Am I doing something wrongly? What? How to get the tool working?

Thx

È stato utile?

Soluzione

You need a urllist.txt that has actual urls in it. The site generator does not spider/crawl your site for you. It can check apache logs or reference other generated sitemaps, but it, by itself, won't crawl.

See my answer at:

https://webmasters.stackexchange.com/questions/47085/is-there-an-xml-sitemap-generator-with-command-line-interface-for-nginx-on-linux/47105#47105

Where I have a command string to generate the url list for a given site by crawling it.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top