Domanda

I'm trying to get a data table off of a website using the RCurl package. My code works successfully for the URL that you get to by clicking through the website:

http://statsheet.com/mcb/teams/air-force/game_stats/

Once you try to select previous years (which I want); my code no longer works.

Example link: http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013

I'm guessing this has something to do with the reserved symbol(s) in the year specific address. I've tried URLencode as well as manually encoding the address but that hasn't worked either.

My code:

library(RCurl)
library(XML)

#Define URL
theurl <-URLencode("http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-    
2013", reserved=TRUE)

webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)

pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)

# Extract table header and contents
tablehead <- xpathSApply(pagetree, "//*/table[1]/thead[1]/tr[2]/th", xmlValue)
results <- xpathSApply(pagetree,"//*/table[1]/tbody/tr/td", xmlValue)

content <- as.data.frame(matrix(results, ncol = 19, byrow = TRUE))

testtablehead <- c("W/L","Opponent",tablehead[c(2:18)])
names(content) <- testtablehead

The relevant error that R returns:

Error in function (type, msg, asError = TRUE)  : 
Could not resolve host: http%3a%2f%2fstatsheet.com%2fmcb%2fteams%2fair-  
force%2fgame_stats%3fseason%3d2012-2013; No data record of requested type

Does anyone have an idea what the problem is and how to fix it?

È stato utile?

Soluzione

Skip the unneeded encoding and download of the url:

library(XML)
url <- "http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013"

pagetree <- htmlTreeParse(url, useInternalNodes = TRUE)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top