I want to quantisize some images with the kmeans2 algorithm. My problem is to find the (near) best count of clusters.

Has someone an idea how to estimate the count of clusters? My idea is to create a cumulative histogram of the hue in the hsv-color-space. But I don't know how to use this information to estimate the count.

greetings

有帮助吗?

解决方案

I personally use the following approach:

Pseudo code:

int k = 1;
double oldCompactness = std::numeric_limits<double>::max();
double compactness = kmeans(data, k);
while( compactness/oldCompactness < threshold ) {
    oldCompactness = compactness;
    k = k + 1;
    compactness = kmeans(data, k);
}

The compactness is decreasing with increasing number of clusters (it should become zero if you have as many clusters as data points).

I should point out that the optimal number of clusters is very application dependent. For example in your application I don't know if you prefer high data reduction (low k) or a good visual representation (high k) or a compromise (somewhere in between).

You can look here for more/better ideas. Or here (week 8) if you prefer video.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top