You need to be careful when using sapply
as it may give unexpected output. In this case you can
res <- sapply(urls, scrape, simplify=FALSE)
or use lapply
instead.
题
I'm trying to scrape some data out of tables, which come in various shapes according to data entry in that table. For some reason, some tables (and hence data) are scrapped incorrectly.
require(data.table)
require(RCurl)
require(XML)
For this type of ID the scraping doesn't work:
ur.l <- data.frame(A=c(1),B=c(36232475,36232475))
For other type ID it works:
ur.l <- data.frame(A=c(1),B=c(17053781,17054346))
scrape <- function(u) {
tryCatch({
tabs <- readHTMLTable(file.path("http://finstat.sk", u,
"suvaha"),encoding='utf-8')
tab <- tabs[[which.max(sapply(tabs, function(x) nrow(x)))]]
data.table(tab)
}, error=function(e) cat())
}
urls <- as.character(ur.l[1:2,2])
res <- sapply(urls, scrape)
filter.null <- res[lapply(res,length)>0]
translit <- function(x) iconv(x, "UTF-8", "ASCII//TRANSLIT", sub = "byte")
invisible(lapply(filter.null,function(x) x[,V1:=translit(V1)]))
Could be someone so kind and tell me how to adjust this so that any shape of table is scraped? For some ID it doesn't work...the error lies in the function scrape()
. Your help is very much appreciated.
解决方案
You need to be careful when using sapply
as it may give unexpected output. In this case you can
res <- sapply(urls, scrape, simplify=FALSE)
or use lapply
instead.