Question

I am building machine learning algorithms in my laptop. It has i3 procesor and 16 GB RAM. Despite using multiple cores(3 out of 4), it takes 2 days to run all the techniques that i am trying to run an obviously data is huge (close to 1.3 million rows and 20 variables).

Is there any way to reduce this time required for running algorithms into fraction of the time it takes currently? Lets say some hours instead of days? I have heard from a friend who has computer science background that spark takes less time than stand alone R. Not sure if Spark can reduce the analysis time from multiple days to some hours. I am open to suggestions and solutions(preferably open source). Thoughts?

I am sure a solution for this must exist as R is pretty old and some genius would have found a way to solve this painful problem.

Was it helpful?

Solution

Depends on the models you are trying to run. Your data isn't that big. For example using a support vector model from the kernlab package, you run into problems. Not every model is fast or has a fast implementation.

Without more information on what you are doing it is difficult to say what causes the bottleneck. But if you just want a speed boost in running models, have a look at the xgboost package, the h2o package (GLM, GBM, rf, deeplearning), ranger for a faster implementation of a randomforest model.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top