Question

I have a dataframe on which I would like to use lapply. I have selected the first values of the first column here:

link <- c(
    "http://www.r-statistics.com/tag/hadley-wickham/",                                                      
    "http://had.co.nz/",                                                                                    
    "http://vita.had.co.nz/articles.html",                                                                  
    "http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html",                          
    "http://www.analyticstory.com/hadley-wickham/"  
)               

the function to apply gets the content of the links and stores it into a corpus [thanks to agstudy]

create.corpus <- function(url.name){
    doc=htmlParse(link)
    parag=xpathSApply(doc,'//p',xmlValue)
    cc=Corpus(VectorSource(parag))
    meta(cc,type='corpus','link')=link
    return(cc)
}

But I cannot get the function working via lapply:

cc=lapply(link,create.corpus) # does not work
cc=lapply(link,nchar) # works

link=link[1] # try on single element
cc=create.corpus(link) # works

Why does this function not work in a lapply?

Was it helpful?

Solution

There's a problem in your function. Replace all instances of link with url.name and it will work.

# library(XML); library(tm)

create.corpus <- function(url.name){
  doc=htmlParse(url.name)
  parag=xpathSApply(doc,'//p',xmlValue)
  cc=Corpus(VectorSource(parag))
  meta(cc,type='corpus','link') <- url.name
  return(cc)
}

cc <- lapply(link, create.corpus)

The result:

> cc
[[1]]
A corpus with 48 text documents

[[2]]
A corpus with 2 text documents

[[3]]
A corpus with 41 text documents

[[4]]
A corpus with 25 text documents

[[5]]
A corpus with 39 text documents
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top