You get an error because as commented you reach the limit of the integer limit, normal since you have huge number of documents.. This reproduces the error :
as.integer(.Machine$integer.max+1)
[1] NA
Warning message:
NAs introduced by coercion
Function vector
which takes an integer as parameter fails since it second parameter is NA.
One solution is to redefine as.matrix.simple_triplet_matrix
without calling vector
. For example:
as.matrix.simple_triplet_matrix <-
function (x, ...)
{
nr <- x$nrow
nc <- x$ncol
## old line: y <- matrix(vector(typeof(x$v), nr * nc), nr, nc)
y <- matrix(0, nr, nc) ##
y[cbind(x$i, x$j)] <- x$v
dimnames(y) <- x$dimnames
y
}
But I am not sure it is a good idea to coerce to a matrix such sparse matrix(100%).
EDIT
One idea is to use saparseMatrix
from Matrix
package. Here an example where I compare the objects generated by each coercion. You gain a factor of 10 at lease ( I think regarding your very sparse matrix , you will gain more) by using sparseMatrix
. Moreover, Addition and multiplication are supported by sparse Matrix.
require(tm)
data("crude")
dtm <- TermDocumentMatrix(crude,
control = list(weighting = weightTfIdf,
stopwords = TRUE))
library(Matrix)
Dense <- sparseMatrix(dtm$i,dtm$j,x=dtm$v)
dense <- as.matrix(dtm)
## check sizes
floor(as.numeric(object.size(dense)/object.size(Dense)))
## addistion and multiplication are supported
Dense+Dense
Dense*Dense