質問

I have a vector of links from which I would like to create a sitemap.xml file (file protocol is available from here: http://www.sitemaps.org/protocol.html)

I understand the sitemap.xml protocol (it is rather simple), but I'm not sure what is the smartest way to use the {XML} package for it.

A simple example:

 links <- c("http://r-statistics.com",
             "http://www.r-statistics.com/on/r/",
             "http://www.r-statistics.com/on/ubuntu/")

How can "links" be used to construct a sitemap.xml file?

役に立ちましたか?

解決

Is something like this what you are looking for. (It uses the httr package to get the last modified bit and writes the XML directly with the very useful whisker package.)

require(whisker)
require(httr)
tpl <- '
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 {{#links}}
   <url>
      <loc>{{{loc}}}</loc>
      <lastmod>{{{lastmod}}}</lastmod>
      <changefreq>{{{changefreq}}}</changefreq>
      <priority>{{{priority}}}</priority>
   </url>
 {{/links}}
</urlset>
'

links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")


map_links <- function(l) {
  tmp <- GET(l)
  d <- tmp$headers[['last-modified']]

  list(loc=l,
       lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
       changefreq="monthly",
       priority="0.8")
}

links <- lapply(links, map_links)

cat(whisker.render(tpl))

他のヒント

I could not use @jverzani's solution, because I wasn't able to create a valid xml file from the cat output. Thus I created an alternative.

## Input a data.frame with 4 columns: loc, lastmod, changefreq, and priority
## This data.frame is named sm in the code below

library(XML)
doc <- newXMLDoc()
root <- newXMLNode("urlset", doc = doc)
temp <- newXMLNamespace(root, "http://www.sitemaps.org/schemas/sitemap/0.9")
temp <- newXMLNamespace(root, "http://www.google.com/schemas/sitemap-image/1.1", "image")

for (i in 1:nrow(sm))
{
  urlNode <- newXMLNode("url", parent = root)
  newXMLNode("loc", sm$loc[i], parent = urlNode)
  newXMLNode("lastmod", sm$lastmod[i], parent = urlNode)
  newXMLNode("changefreq", sm$changefreq[i], parent = urlNode)
  newXMLNode("priority", sm$priority[i], parent = urlNode)
  rm(i, urlNode)
}

saveXML(doc, file="sitemap.xml")
rm(doc, root, temp)
browseURL("sitemap.xml")
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top