ELKI - Clustering Statistics

Question

The automatic evaluation (note that you can configure the evaluation manually!) is based on labels in your data set. At least in the current version (why are you using 0.5 and not 0.6.0?) it should only automatically evaluate if it finds labels in the data set.

We currently have not published internal measures. There are some implementations, such as evaluation/clustering/internal/EvaluateSilhouette.java, some of which will be in the next release.

In my experiments, internal evaluation measures were badly misleading. For example on the Silhouette coefficient, the labeled "solution" would often even score a negative silhouette coefficient (i.e. worse than not clustering at all).

Also, these measures are not scalable. The silhouette coefficient is in O(n^2) to compute; which usually makes this evaluation more expensive than the actual clustering!

We do appreciate contributions!

You are more than welcome to contribute your favorite evaluation measure to ELKI, to share with others.