Question

When I save the inspect() object in R's tm package it prints to screen. It does save the data that I want in the data.frame, but I have thousands of documents to analyze and the printing to screen is eating up my memory.

library(tm)
data("crude")
matrix <- TermDocumentMatrix(corpus,control=list(removePunctuation = TRUE,
                                             stopwords=TRUE))
out= data.frame(inspect(matrix))

I have tried every trick that I can think of. capture.output() changes the object (not the desired effect), as does sink(). dev.off() does not work. invisible() does nothing. suppressWarnings(), suppressMessages(), and try() unsurprisingly do nothing. There are no silent or quiet options in the inspect command.

The closest that I can get is

out= capture.output(inspect(matrix))
out= data.frame(out)

which notably does not give the same data.frame, but pretty easily could be if I need to go down this route. Any other (less hacky) suggestions would be helpful. Thanks.

Windows 7 64- bit R-3.0.1 tm package is the most recent version (0.5-9.1).

Was it helpful?

Solution

Assign inside the capture then:

capture.output(out <- data.frame(inspect(matrix))) -> .null # discarding this

But really, inspect is for visual inspection, so maybe try

as.data.frame(as.matrix(matrix))

instead (btw matrix is a very unfortunate name for a variable, as that's a base function).

OTHER TIPS

Using this input (varible name changed from you question as using a variable named "matrix" can be confusing:

library(tm)
data("crude")

tdm <- TermDocumentMatrix(crude,control=list(removePunctuation = TRUE,
                                                 stopwords=TRUE))

Then this will avoid printing to screen

m <- as.matrix(tdm)

and then I would personally do something like

require(data.table)
data.table(m, keep.rownames=TRUE)

#               rn 127 144 191 194 211 236 237 242 246 248 273 349 352 353 368 489 502 543 704 708
#    1:     100000   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
#    2:        108   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
#    3:        111   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
#    4:        115   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1
#    5:      12217   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0
# ---                                                                                           
#  996:  yesterday   0   0   0   0   0   0   0   3   0   0   1   0   0   0   0   0   0   0   0   0
#  997: yesterdays   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
#  998:       york   0   1   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0
#  999:       zero   0   0   0   0   0   0   1   0   0   0   1   0   0   0   0   0   0   0   0   0
# 1000:       zone   0   0   0   0   0   0   0   0   0   0   2   0   0   0   0   0   0   0   0   0 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top