Question

Now that the whole world is clambering to use SSL all the time (a decision that makes a lot of sense) some of us who have used github and related services to store csv files have a little bit of a challenge. The read.csv() function does not support SSL when reading from a URL. To get around this I'm doing a little dance I like to call the SSL kabuki dance. I grab the text file with RCurl, write it to a temp file, then read it with read.csv(). Is there a smoother way of doing this? Better work-arounds?

Here's a simple example of the SSL kabuki:

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con) 
close(con)

read.csv(temporaryFile)
Was it helpful?

Solution

Yes -- see help(download.file) which is pointed to by read.csv() and all its cousins. The method= argument there has:

method Method to be used for downloading files. Currently download methods "internal", "wget", "curl" and "lynx" are available, and there is a value "auto": see ‘Details’. The method can also be set through the option "download.file.method": see options().

and you then use this option to options():

download.file.method: Method to be used for download.file. Currently download methods "internal", "wget" and "lynx" are available. There is no default for this option, when method = "auto" is chosen: see download.file.

to turn to the external program curl, rather than the RCurl package.

Edit: Looks like I was half-right and half-wrong. read.csv() et al do not use the selected method, one needs to manually employ download.file() (which then uses curl or other selected methods). Other functions that do use download.file() (such as package installation or updates) will profit from setting the option, but for JD's initial query concerning csv files over https, an explicit download.file() is needed before read.csv() of the downloaded file.

OTHER TIPS

No need to write it to a file - just use textConnection()

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))

Using Dirk's advice to explore method="" resulted in this slightly more concise approach which does not depend on the external RCurl package.

temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)

But it appears that I can't just set options("download.file.method"="curl")

R core should open up the R connections as a C API. I've proposed this in the past:

https://stat.ethz.ch/pipermail/r-devel/2006-October/043056.html

with no response.

Given that this question comes up a lot, I've been working on a package to seamlessly handle HTTPS/SSL data. The package is called rio. A version of it is on CRAN but the newest version that now supports this is only available on GitHub. Once you've installed the package, you can read in data in one line:

# install and load rio
library("devtools")
install_github("leeper/rio")
library("rio")

# import
import("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
##   a b
## 1 1 2
## 2 2 3
## 3 3 4
## 4 4 5

Basically, import handles the manual download (using curl) and then infers the file format from the file extension, thus creating a dataframe without needing to know what function to use or how to download it.

I found that since Dropbox changed the way that they present links with https:// none of the above solutions work any more. Fortunately, I wasn't the first to make this discovery, and a solution was posted by Christopher Gandrud on r-bloggers:

http://www.r-bloggers.com/dropbox-r-data/

That approach works for me, after installing the repmis package and its dependencies.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top