문제

I use Libsvm to train data and predict classification on semantic analysis problem. But it has a performance issue on large-scale data, because semantic analysis concerns n-dimension problem.

Last year, Liblinear was release, and it can solve performance bottleneck. But it cost too much memory. Is MapReduce the only way to solve semantic analysis problem on big data? Or are there any other methods that can improve memory bottleneck on Liblinear?

도움이 되었습니까?

해결책

Note that there is an early version of LIBLINEAR ported to Apache Spark. See mailing list comments for some early details, and the project site.

다른 팁

You can check out vowpal wabbit. It is quite popular for large-scale learning and includes parallel provisions.

From their website:

VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top