문제

I'm trying to scrape some data out of tables, which come in various shapes according to data entry in that table. For some reason, some tables (and hence data) are scrapped incorrectly.

require(data.table)
require(RCurl)
require(XML)

For this type of ID the scraping doesn't work:

 ur.l <- data.frame(A=c(1),B=c(36232475,36232475))

For other type ID it works:

ur.l <- data.frame(A=c(1),B=c(17053781,17054346))


scrape <- function(u) {
          tryCatch({
          tabs <- readHTMLTable(file.path("http://finstat.sk", u, 
                  "suvaha"),encoding='utf-8')
tab <- tabs[[which.max(sapply(tabs, function(x) nrow(x)))]]
data.table(tab)
}, error=function(e) cat())
}

urls <- as.character(ur.l[1:2,2]) 
res <- sapply(urls, scrape)

filter.null <- res[lapply(res,length)>0]

translit <- function(x) iconv(x, "UTF-8", "ASCII//TRANSLIT", sub = "byte")
invisible(lapply(filter.null,function(x) x[,V1:=translit(V1)]))

Could be someone so kind and tell me how to adjust this so that any shape of table is scraped? For some ID it doesn't work...the error lies in the function scrape(). Your help is very much appreciated.

도움이 되었습니까?

해결책

You need to be careful when using sapply as it may give unexpected output. In this case you can

res <- sapply(urls, scrape, simplify=FALSE)

or use lapply instead.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top