This is known as single-link effect.
Kruskal seems to be a semi-clever way of computing single-linkage clustering. The naive approach for "hierarchical clustering" is O(n^3)
, and the Kruskal approach should be O(n^2 log n)
due to having to sort the n^2
edges.
Note that SLINK can do single-linkage clustering in O(n^2)
runtime and O(n)
memory.
Have you tried loading your data set e.g. into ELKI, and compare your result to single-link clustering.
To get bette results, try other linkages (usually in O(n^3)
runtime) or density-based clustering such as DBSCAN (in O(n^2)
without index, and O(n log n)
with index). On this toy data set, epsilon=2
and minPts=5
should work good.