Question

I'm trying to run some analysis with some big datasets (eg 400k rows vs. 400 columns) with R (e.g. using neural networks and recommendation systems). But, it's taking too long to process the data (with huge matrices, e.g. 400k rows vs. 400k columns). What are some free/cheap ways to improve R performance?

I'm accepting packages or web services suggestions (other options are welcome).

Was it helpful?

Solution

Although your question is not very specific so I'll try to give you some generic solutions. There are couple of things you can do here:

  • Check sparseMatrix from Matrix package as mentioned by @Sidhha
  • Try running your model in parallel using packages like snowfall, Parallel. Check this list of packages on Cran which can help you runnning your model in multicore parallel mode.
  • You can also try data.table package. It is quite phenomenal in speed.

Good reads:

  1. 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)
  2. Why R is slow & how to improve its Performance?

OTHER TIPS

Since you mention you are building a recommendation system, I believe you have a sparse matrix which you are working on. Check sparseMatrix from Matrix package. This should be able to help you with storing your large size matrix in memory and train your model.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top