Question

I'm created a code book based on k-means clustering algorithm.But the algorithm didn't converge to an optimal code book, each time, the cluster centroids are varying(because of random selection of initial seeds). There is an option in Matlab to give an initial matrix to K-Means.But how we can can select the initial code book from a large data set? Is there any other way to get a unique code book using K-means?

No correct solution

OTHER TIPS

It's somewhat standard to run k-means multiple times using different initial states (e.g., initial seeds) and choose the result with the lowest error as the best result.

It's also typical to seed k-means by randomly choosing k elements from your data set as the initial seeds.

Since by default MATLAB's K-Means uses the K-MEans++ algorithm for initialization it means it uses random numbers.

Hence each call (For sequential calls) to K-Means will probably produce different results.

You have 3 options to make this deterministic:

  1. Set MATLAB's Random Number Generator state to certain state before calling K-Means.
  2. Use the stream option in K-Means options to set the stream inside K-Means.
  3. Write your own version of K-Means which uses a deterministic way to initialize K-Means.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top