Are there any types of clustering algorithms that focus on forming specific sized clusters? This can be thought of us as a grouping algorithm more than a clustering algorithm.

Basically, given n data points, and fixed groups of a certain size k, find the optimal distribution of points to sets based upon certain classifiers, that will hopefully minimize the distance of classifiers for each point in a given group.

This problem seems to be pretty similar to a clustering problem, but the main difference is that we are concerned with a specific cluster size, but not concerned about the number of clusters.

有帮助吗?

解决方案

There is a tutorial on how to implement such an algorithm in ELKI:

http://elki.dbs.ifi.lmu.de/wiki/Tutorial/SameSizeKMeans

Also have a look at constraint clustering algorithms; although usually these algorithms only support "Must link" and "cannot link" constraints, not size constraints.

You should be able to do a similar modification where you first specify the group sizes, then assign points randomly, and swap cluster members as long as your objective function improves; similar to k-means / k-medoids. As you may get stuck in local minima, restart a number of times and only keep the best.

See also earlier questions, e.g. K-means algorithm variation with equal cluster size and Group n points in k clusters of equal size

其他提示

The problem that you are posing is a combinatorial optimization problem. It is very important to know if you need an exact solution, or that can you settle for an approximate one?

If you need exact solutions, there is a body of work that focuses on clustering with different types of constraints. The constraint that you mentioned can be encoded in this framework. However, you should now that this approach scales up to a datasets with a certain size.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top