Domanda

I have a vector of links from which I would like to create a sitemap.xml file (file protocol is available from here: http://www.sitemaps.org/protocol.html)

I understand the sitemap.xml protocol (it is rather simple), but I'm not sure what is the smartest way to use the {XML} package for it.

A simple example:

 links <- c("http://r-statistics.com",
             "http://www.r-statistics.com/on/r/",
             "http://www.r-statistics.com/on/ubuntu/")

How can "links" be used to construct a sitemap.xml file?

È stato utile?

Soluzione

Is something like this what you are looking for. (It uses the httr package to get the last modified bit and writes the XML directly with the very useful whisker package.)

require(whisker)
require(httr)
tpl <- '
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 {{#links}}
   <url>
      <loc>{{{loc}}}</loc>
      <lastmod>{{{lastmod}}}</lastmod>
      <changefreq>{{{changefreq}}}</changefreq>
      <priority>{{{priority}}}</priority>
   </url>
 {{/links}}
</urlset>
'

links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")


map_links <- function(l) {
  tmp <- GET(l)
  d <- tmp$headers[['last-modified']]

  list(loc=l,
       lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
       changefreq="monthly",
       priority="0.8")
}

links <- lapply(links, map_links)

cat(whisker.render(tpl))

Altri suggerimenti

I could not use @jverzani's solution, because I wasn't able to create a valid xml file from the cat output. Thus I created an alternative.

## Input a data.frame with 4 columns: loc, lastmod, changefreq, and priority
## This data.frame is named sm in the code below

library(XML)
doc <- newXMLDoc()
root <- newXMLNode("urlset", doc = doc)
temp <- newXMLNamespace(root, "http://www.sitemaps.org/schemas/sitemap/0.9")
temp <- newXMLNamespace(root, "http://www.google.com/schemas/sitemap-image/1.1", "image")

for (i in 1:nrow(sm))
{
  urlNode <- newXMLNode("url", parent = root)
  newXMLNode("loc", sm$loc[i], parent = urlNode)
  newXMLNode("lastmod", sm$lastmod[i], parent = urlNode)
  newXMLNode("changefreq", sm$changefreq[i], parent = urlNode)
  newXMLNode("priority", sm$priority[i], parent = urlNode)
  rm(i, urlNode)
}

saveXML(doc, file="sitemap.xml")
rm(doc, root, temp)
browseURL("sitemap.xml")
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top