Question

I am using carrot2 for clustering a set of 500 emails. I am using the BisectingKMeans algorithm provided by carrot2. On the same data set, when I specify k = 9, only 6 are generated and when I give it to run with 8 clusters, 7 are generated – however when I give 10 clusters to run , all 10 are generated. Can anyone please help me figure out the reason behind this?

Was it helpful?

Solution

I've had a look at the code and it looks like this behaviour was caused by a bug in the cluster splitting routine. I've committed a fix to the master line of Carrot2, which makes the number of generated clusters more predictable. You can download the binaries with the fix from Carrot2 build server.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top