Same result from K-means and sequential K-means?

https://stackoverflow.com/questions/8351296

27-10-2019
|

Question

Do we obtain the same result if we apply K-means and sequential K-means methods to the same dataset with the same initial settings? Explain your reasons.

Personally I think the answer is No. The result obtained by sequential K-means depends on the presentation order of the data points. And the ending condition is not the same.

Here attaches the pseudo code of the two clustering algorithms.

K-means

Make initial guesses for the means m1, m2, ..., mk
Until there is no change in any mean
    Assign each data point to the cluster whose mean is the nearest.
    Calculate the mean of each cluster.
    For i from 1 to k
        Replace mi with the mean of all examples for cluster i.
    end_for
end_until

Sequential K-means

Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
    Acquire the next example, x
    If mi is closest to x
        Increment ni
        Replace mi by mi + (1/ni)*(x - mi)
    end_if
end_until

Solution

Correct, the results may be different.

Points: x1 = (0,0), x2 = (1,1), x3 = (0.75,0), x4 = (0.25,1); m1 = (0,0.5), m2 = (1,0.5). K-means assigns x1 and x4 to the m1-cluster, x2 and x3 to the m2-cluster. The new means are m1' = (0.125,0.5) and m2' = (0.875,0.5), and no reassignment takes place. With sequential K-means, after x1 is assigned, m1 moves to (0,0), x2 moves m2 to (1,1). Then m1 is closest mean to x3, so m1 moves to (0.375,0). Finally, m2 is closest to x4, so m2 moves to (0.625,1). This is again a stable configuration.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow