Question

I'm getting from user a URL to webpage OR sitemap.

What is simplest way to check the type (Sitemap or Webpage) of the given URL?

Thank you!

Was it helpful?

Solution

Having sought clarification of the question, here is what you'll need to do:

  1. Check the URL is valid and fetch the contents.
  2. Validate the contents against the XML-based sitemap spec at http://www.sitemaps.org/protocol.html. This can best be done by defining classes that map to urlset and url and by de-serializing the XML to those types.
  3. If it's valid XML, then treat it as a sitemap.
  4. If it's invalid XML, you may wish to warn the user, or just treat it as a webpage.
  5. If not XML, you may treat it as a webpage.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top