Fuzzy k-means - without association, how are the centroids calculated in the next iteration?

https://stackoverflow.com/questions/10069418

30-05-2021
|

質問

According to the Mahout in Action

Like k-means, fuzzy k-means loops over the data set but instead of assigning vectors to the nearest centroids, it calculates the degree of association of the point to each of the clusters.

Without assigning vectors to the nearest centroids, how are the centroids calculated in the next iteration?

解決

I just googled fuzzy k-means and it sounded basically like EM clustering which is a pretty widely known and useful concept.

The thing here is that there are no hard assignments made.

When a point is choosing which centroid it should belong to, it comes up probabilities for it belonging to each of the centroid (by considering its distance from each centroid and normalizing these numbers by their cumulative sum)

When a centroid is deciding where to relocate to, it does not have a well defined group of points belonging to it whose average it can simply take for its new location. Instead what it does is take a weighted average of the points based on the probability with which they belong to it. So if there are only 3 points X, Y and Z and X and Y belong to this cluster with probability 1.0 each and Z belongs to it with probability 0.5, then the new location of the centroid would be

(1.0/2.5) * X + (1.0/2.5) * Y + (0.5/2.5) * Z

So this is how the centroids are calculated in each iteration.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow