Pregunta

I want to quantisize some images with the kmeans2 algorithm. My problem is to find the (near) best count of clusters.

Has someone an idea how to estimate the count of clusters? My idea is to create a cumulative histogram of the hue in the hsv-color-space. But I don't know how to use this information to estimate the count.

greetings

¿Fue útil?

Solución

I personally use the following approach:

Pseudo code:

int k = 1;
double oldCompactness = std::numeric_limits<double>::max();
double compactness = kmeans(data, k);
while( compactness/oldCompactness < threshold ) {
    oldCompactness = compactness;
    k = k + 1;
    compactness = kmeans(data, k);
}

The compactness is decreasing with increasing number of clusters (it should become zero if you have as many clusters as data points).

I should point out that the optimal number of clusters is very application dependent. For example in your application I don't know if you prefer high data reduction (low k) or a good visual representation (high k) or a compromise (somewhere in between).

You can look here for more/better ideas. Or here (week 8) if you prefer video.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top