Robust Clustering Algorithm [closed]

https://stackoverflow.com/questions/19325601

30-06-2022
|

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist

Closed 8 years ago.

Improve this question

Say I have items i1, ..., iN

I would like to cluster them in such a way that:

If I ran the cluster many many times the probability that items iJ and iK would end up in the same cluster is high.
The number of clusters and cluster memberships are relatively stable regardless of cluster seeds

Are there well known algorithms to achieve this?

Clarification:

say I want 3 clusters and say:

in reality-1 I start with i1, i33, i89 as seeds for cluster c1 c2 c3
in reality-2 I start with i44, i55, i77 as seeds for cluster c1 c2 c3

I want the resulting clusters in both realities to be largely similar

Solution 2

A often-seen strategy to make an algorithm more robust with respect to initialization, is to bootstrap it. See for instance this paper.

The other option is to sort the data beforehand and use a strictly deterministic algorithm.

OTHER TIPS

I think that hierarchical clustering algorithms will meet your needs.

Cluster consistency is garanteed for the same set, probability that items iJ and iK would end up in the same cluster is 1.
There is no seed. You choose the right number of cluster by analysing the tree, or using existing cut off algorithms (there are a LOT of them).

[EDIT]

In fact any deterministic clustering algorithm has these features, not just hierarchical clustering.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow