Each of the three additional urls that you provided refer to pages that contain no tables, so it's not a particularly useful example dataset. However, a simple way to handle errors is with tryCatch
. Below I've defined a function that reads in tables from url u
, calculates the number of rows for each table at that url, then returns the table with the most rows as a data.table
.
You can then use sapply
to apply this function to each url (or, in your case, each org ID, e.g. 36245119) in a vector.
library(XML); library(data.table)
scrape <- function(u) {
tryCatch({
tabs <- readHTMLTable(file.path("http://finstat.sk", u, "suvaha"),
encoding='utf-8')
tab <- tabs[[which.max(sapply(tabs, function(x) nrow(x)))]]
data.table(tab)
}, error=function(e) e)
}
urls <- c('36245119', '46894853', '46892460', '46888721')
res <- sapply(urls, scrape)
Take a look at ?tryCatch
if you want to improve the error handling. Presently the function simply returns the errors themselves.