Question

I am specifically wondering if MapR has Kmeans clustering just like Mahout?

Was it helpful?

Solution

As far as I know, MapR is only a "faster" Hadoop. There are no algorithms included.

So your jobs should be compatible.

But what is the deal in implementing your own? K-means is ultra simple. See my blog post: http://codingwiththomas.blogspot.com/2011/05/k-means-clustering-with-mapreduce.html

However I have implemented a k-means clustering with BSP (Bulk Synchronous Parallel) and Apache Hama which is almost ten times faster if you compare it with the Mahout benchmark results in this book: http://www.manning.com/ingersoll/ (linked jira: https://issues.apache.org/jira/browse/MAHOUT-588) Here is the benchmark of k-means with Apache Hama: http://wiki.apache.org/hama/Benchmarks

You can find it here: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top