Question

I have saved my webCorpus of 100 text document into a single file by

lapply(inspect(gsrc), write, filename, append=TRUE, ncolumns=1000)
meta(gsrc[[1]])
Available meta data pairs are:
Author       : 
DateTimeStamp: 2013-10-23 11:46:47
Description  : BDliveShutdown Will ..........................
Heading      : Shutdown Will Hinder True Gauge of US Economy - New York Times
ID           : 

As i saved into single file so will reading

cop <- Corpus(DirSource("/home/ashish/tm_web/23", encoding = "UTF-8"),readerControl = list(language = "lat")) 
meta(cop[[1]])
Available meta data pairs are:
Author       : 
DateTimeStamp: 2013-10-23 11:38:20
Description  : 
Heading      : 
ID           : ABC22.txt
Language     : lat
Origin       : 

Is it possible to get back meta data of saved corpus or do i have to saves 100 text file in order to get meta(cop) as meta(gsrc) or Do i have to save meta(gsrc[[1]]) in order to get it back ,Any help,thanks.

Was it helpful?

Solution

You can do something like this. I am using crude data from tm package to show the idea below. I imagine you can easily change the code to use it with your code.

## For each tag , for each corpus , I apply meta
##  to get a list of list (list of tags, for each tag a list of metas)
library(tm)
data("crude")
tags <- c('DateTimeStamp','Heading')
res <- lapply(tags,function(tag)
  lapply(crude,meta,tag))
names(res) <- tags
## I save the list
save(res,file = "meta.RData")

Now I load the saved meta , and I do the reverse job .

## load the data 
load("meta.RData")
## for each tag, for each corpus, assign the meta
for(tag in tags){
      meta.tag <- res[[tag]]
      lapply(seq_along(crude),function(y)
             meta(crude[[y]],tag) <- meta.tag[[y]])
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top