library(RCurl)
url.exists("http://www.jatma.or.jp/toukei/xls/13_01.xls", .header=T)["Last-Modified"]
# Last-Modified
# "Fri, 06 Dec 2013 05:33:53 GMT"
R: How to cleanly retrieve the attributes of a remote file on the internet?
Question
I can download a file from the internet easily enough using code such as this:
myurl <- "http://www.jatma.or.jp/toukei/xls/13_01.xls"
download.file(myurl, destfile = myfilepath, mode = 'wb')
However, usually I want to check the date the file was last modified before I download it. I can do this very easily in Perl using the LWP::Simple
package. I've poked through the documentation for RCurl
(which I admit I understand only poorly) and the closest thing I can find is the basicHeaderGatherer
function.
library(RCurl)
if(url.exists("http://www.jatma.or.jp/toukei/xls/13_01.xls")) {
h = basicHeaderGatherer()
foo <- getURL("http://www.jatma.or.jp/toukei/xls/13_01.xls",
headerfunction = h$update)
names(h$value())
h$value()
}
h$value()[3]
By using the code above I can eventually access the 'Last-Modified' attribute, but not without generating errors as per the output below. How can I clean up my code to avoid this error and access the 'Last-Modified' attribute in a straightforward manner?
(Please note: this answer looks promising but it generates similar error messages to those shown below, so it doesn't resolve this particular issue.)
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) (from #3) :
embedded nul in string: ' \021ࡱ\032 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\0\003\0 \t\0\006\0\0\0\0\0\0\0\0\0\0\0\001\0\0\09\0\0\0\0\0\0\0\0\020\0\0 \0\0\0\0 \0\0\0\08\0\0\0 \t\b\020\0\0\006\005\0g2 \a \0\002\0\006\006\0\0 \0\002\0 \004 \0\002\0\0\0 \0\0\0\\\0p\0\003\0\0CVC B\0\002\0 \004a\001\002\0\0\0 \001\0\0=\001\002\0$\0 \0\002\0\021\0\031\0\002\0\0\0\022\0\002\0\0\0\023\0\002\0\0\0 \001\002\0\0\0 \001\002\0\0\0=\0\022\0 \017\0xKX/8\0\0\0\
> h$value()[3]
Last-Modified
"Fri, 06 Dec 2013 05:33:53 GMT"
>
Solution