Question

Say I have items i1, ..., iN

I would like to cluster them in such a way that:

  1. If I ran the cluster many many times the probability that items iJ and iK would end up in the same cluster is high.
  2. The number of clusters and cluster memberships are relatively stable regardless of cluster seeds

Are there well known algorithms to achieve this?

Clarification:

say I want 3 clusters and say:

  • in reality-1 I start with i1, i33, i89 as seeds for cluster c1 c2 c3
  • in reality-2 I start with i44, i55, i77 as seeds for cluster c1 c2 c3

I want the resulting clusters in both realities to be largely similar

Était-ce utile?

La solution 2

A often-seen strategy to make an algorithm more robust with respect to initialization, is to bootstrap it. See for instance this paper.

The other option is to sort the data beforehand and use a strictly deterministic algorithm.

Autres conseils

I think that hierarchical clustering algorithms will meet your needs.

  1. Cluster consistency is garanteed for the same set, probability that items iJ and iK would end up in the same cluster is 1.
  2. There is no seed. You choose the right number of cluster by analysing the tree, or using existing cut off algorithms (there are a LOT of them).

[EDIT]

In fact any deterministic clustering algorithm has these features, not just hierarchical clustering.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top