I've had a look at the code and it looks like this behaviour was caused by a bug in the cluster splitting routine. I've committed a fix to the master line of Carrot2, which makes the number of generated clusters more predictable. You can download the binaries with the fix from Carrot2 build server.
Number of clusters obtained using carrot2 inconsistent on the same data set
-
31-05-2022 - |
Domanda
I am using carrot2 for clustering a set of 500 emails. I am using the BisectingKMeans algorithm provided by carrot2. On the same data set, when I specify k = 9, only 6 are generated and when I give it to run with 8 clusters, 7 are generated – however when I give 10 clusters to run , all 10 are generated. Can anyone please help me figure out the reason behind this?
Soluzione
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow