Instead of choosing k
random cluster centers to start (I=randperm(N); M=X(I(1:K),:);
in your code), simply pass cluster centers as input arguments to kmeans3
:
function [M,j,e] = kmeans3(X,K,Max_Its,M)
Where M
is k-by-D
.
Also, I strongly suggest vectorizing your implementation with bsxfun
. Please see my solution to the question "optimizing manually-coded k-means in MATLAB". Essentially, the inside of your for n=1:Max_Its
loop would look something like:
% Calculate all high-dimensional distances at once
kdiffs = bsxfun(@minus,X,permute(M,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
Dm = sum(kdiffs.^2,2); % no need to do sqrt
Dm = squeeze(Dm); % Nx1xK => NxK
% Find closest cluster center for each point
[~,ik] = min(Dm,[],2); % Nx1
% Calculate the new cluster centers (mean the data)
M_new = zeros(k,D);
for i=1:k,
indk = ik==i;
clustersizes(i) = nnz(indk);
M_new(i,:) = mean(X(indk,:))';
end
M = M_new; % update and iterate
Note that M_new
has a row for each cluster, but if a cluster has no members, then that row will be NaN
s.