Ignoring non-existent URLs with htmlParse() in R

Question 1

Here's another option that uses the httr package. (BTW: you don't need RJSONIO). Replace your wiki.tables(...) function with this:

wiki.tables <- function(towns)  {
  require(httr)
  require(XML)
  get.HTML<- function(url){
    resp <- GET(url)
    if (resp$status_code==200) return(htmlParse(content(resp,type="text")))
  }
  u <- paste('http://en.wikipedia.org/wiki/',
             sep = '', towns[,1], ',_', towns[,2])
  res <- lapply(u, get.HTML)
  res <- res[sapply(res,function(x)!is.null(x))]   # remove NULLs
  tabs <- lapply(sapply(res, getNodeSet, path = '//*[@class="infobox vcard"]')
                 , readHTMLTable)
  return(tabs)
}

This runs one GET request and tests the status code. The disadvantage of url.exists(...) is that you have to query every url twice: once to see if it exists, and again to get the data.

Incidentally, when I tried your code the Yunderup url does in fact exist ??

Question 2

You can use the 'url.exists' function from `RCurl`

require(RCurl)
u <- paste('http://en.wikipedia.org/wiki/',
                 sep = '', towns[,'name'], ',_', towns[,'state'])
> sapply(u, url.exists)
   http://en.wikipedia.org/wiki/Balgal_Beach,_Queensland 
                                                    TRUE 
 http://en.wikipedia.org/wiki/Balhannah,_South_Australia 
                                                    TRUE 
           http://en.wikipedia.org/wiki/Ballan,_Victoria 
                                                    TRUE 
http://en.wikipedia.org/wiki/Yunderup,_Western_Australia 
                                                    TRUE