Question

I've created a tdm matrix in R which I want to write to a file. This is a large sparse matrix in simple triplet form, ~ 20,000 x 10,000. When I convert it to a dense matrix to add columns by cbind, I get low memory errors and the process does not complete. I don't want to increase my RAM.

Also, I want to - - bind the tf and tfidf matrices together - save the sparse/dense matrix to csv - run batch machine learning algorithms such as J48 implementation of weka.

How do I save/ load dataset and run the batch ML algorithms within memory constraints?

If I can write a sparse matrix to a data store, can I run ml algorithms in R on a sparse matrix, and within memory constraints?

Was it helpful?

Solution

There could be several solutions:

1) Convert your matrix from double to integer, if you are dealing with integer numbers. Integers needs less memory comparing to double numbers.

2) Try the bigmemory package.

OTHER TIPS

A third solution, in addition to those mentioned by @djhurio, is to use cloud computing services, such as those provided by Amazon EC2. You don't mention how much RAM do you require exactly, but from I could quickly gather from the current price list, using these services will gain you up to 244 Gb of RAM. I doubt you'll need that much in reality, and if all you need is 16-32 Gb, the price will not be prohibitive at all.

If you are an academic user, you may want to look into RevoScaleR in Revolution R, a commercial version of R which is available for free in the academic context. This software handles large objects out of the box.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top