k-means initial centers determine the result?

Question 1

Firstly, the k-means clustering algorithm doesn't necessarily produce an optimal result, thus that's already a fairly significant indicator that it might have different results from different starting points.

It really comes down to the fact that each cluster uses the points in its own cluster to determine where it should move to - if all the clusters find their way to the centre of their respective points, the algorithm will terminate, and there can be multiple ways this can happen.

Consider this example: (4 points indicated by . and 2 clusters indicated by x)

.         .                      .    x    .
x         x         versus       
.         .                      .    x    .

Both the left and the right side have converged, but they're clearly different (the one on the right is clearly worse).

To find the best one you can pick the result that minimizes the sum of square distances from the centres to each of the points classified under it (this is, after all, the goal of k-means clustering).

Question 2

The idea is to run several times your clustering algorithm with K different centroids for your clusters, initialized with K points randomly taken from your data set.

Then your best solution will be the one that minimizes the sum of the square distance between a point and the centroid of the cluster it belongs to.

Question 3

K-means works by trying to improve the answer that it is given until it reaches a local optimum, but there is a good argument that there is no single global optimum, and therefore no single local optimum. If there was, then every K-means algorithm on the same dataset would always converge to the same answer. But cannot happen, because if I take an answer after K-means has converged and renumber the clusters I get another answer which is different but scores exactly as well as the answer I started with, so there are in fact multiple global optima.

There are various schemes to provide starting points to K-means, which you might try as well as choosing starting points at random. Some recent ones claim improvements in speed, such as http://en.wikipedia.org/wiki/K-means%2B%2B. There are some tests of simpler methods at http://jihe.net/downloads/papers/he04ijcnn.pdf