Question

I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem.

Last year, Liblinear was release, and it can solve performance bottleneck. But it cost too much memory. Is MapReduce the only way to solve semantic analysis problem on big data? Or are there any other methods that can improve memory bottleneck on Liblinear?

Was it helpful?

Solution

Note that there is an early version of LIBLINEAR ported to Apache Spark. See mailing list comments for some early details, and the project site.

OTHER TIPS

You can check out vowpal wabbit. It is quite popular for large-scale learning and includes parallel provisions.

From their website:

VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top