If the kmeans
is the most consuming part, you can apply the k-means algorithm to a random subset of your population. If the size of the random subset is still large compared with the number of centroids you choose, you will get mostly the same results. Alternatively, you can run several kmeans on several subsets and merge the results.
Another option is to try the k-medoid algorithm, which will give centroids which are part of the population so the second part of finding the member of each cluster closest to its centroid will not be needed. It might be slower than the k-means though.