Question

I'm trying to scrape some data out of tables, which come in various shapes according to data entry in that table. For some reason, some tables (and hence data) are scrapped incorrectly.

require(data.table)
require(RCurl)
require(XML)

For this type of ID the scraping doesn't work:

 ur.l <- data.frame(A=c(1),B=c(36232475,36232475))

For other type ID it works:

ur.l <- data.frame(A=c(1),B=c(17053781,17054346))


scrape <- function(u) {
          tryCatch({
          tabs <- readHTMLTable(file.path("http://finstat.sk", u, 
                  "suvaha"),encoding='utf-8')
tab <- tabs[[which.max(sapply(tabs, function(x) nrow(x)))]]
data.table(tab)
}, error=function(e) cat())
}

urls <- as.character(ur.l[1:2,2]) 
res <- sapply(urls, scrape)

filter.null <- res[lapply(res,length)>0]

translit <- function(x) iconv(x, "UTF-8", "ASCII//TRANSLIT", sub = "byte")
invisible(lapply(filter.null,function(x) x[,V1:=translit(V1)]))

Could be someone so kind and tell me how to adjust this so that any shape of table is scraped? For some ID it doesn't work...the error lies in the function scrape(). Your help is very much appreciated.

Was it helpful?

Solution

You need to be careful when using sapply as it may give unexpected output. In this case you can

res <- sapply(urls, scrape, simplify=FALSE)

or use lapply instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top