Finding the spread of each cluster from Kmeans

https://stackoverflow.com/questions/2320595

22-09-2019
|

Question

I'm trying to detect how well an input vector fits a given cluster centre. I can find the best match quite easily (the centre with the minimum euclidean distance to the input vector is the best), however, I now need to work how good a match that is.

To do this I need to find the spread (standard deviation?) of the vectors which build up the centroid, then see if the distance from my input vector to the centre is less than the spread. If it's more than the spread than I should be able to say that I have no clusters to fit it (given that the best doesn't fit the input vector well).

I'm not sure how to find the spread per cluster. I have all the centre vectors, and all the training vectors are labelled with their closest cluster, I just can't quite fathom exactly what I need to do to get the spread.

I hope that's clear? If not I'll try to reword it! TIA Ian

Solution

Use the distance function and calculate the distance from your center point to each labeled point, then figure out the mean of those distances. That should give you the standard deviation.

OTHER TIPS

If you switch to using a different algorithm, such as Mixture of Gaussians, you get the spread (e.g., std. deviation) as part of the model (clustering result).

http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/mixture.html

http://en.wikipedia.org/wiki/Mixture_model

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow