Implement k-means to have same solution as MATLAB

https://stackoverflow.com/questions/22178881

03-06-2023
|

문제

I'm implementing an alternative kmeans function in MATLAB and I'm wondering if there's any way to have the same (or slightly different) solution of centroids,

My k-means function is:

function [M,j,e] = kmeans3(X,K,Max_Its)

[N,D]=size(X);  
I=randperm(N);  
M=X(I(1:K),:);  
Mo = M;         

for n=1:Max_Its
    for k=1:K
        Dist(:,k) = sum((X - repmat(M(k,:),N,1)).^2,2);
    end
    [i,j]=min(Dist,[],2);

    for k=1:K
        if size(find(j==k))>0
            M(k,:) = mean(X(find(j==k),:));
        end
    end
    Z = zeros(N,K);
    for m=1:N
        Z(m,j(m)) = 1;  
    end
    e = sum(sum(Z.*Dist)./N);
    fprintf('%d Error = %f\n', n, e);
    Mo = M;
end

I know if I can get deterministic behavior by passing it an initial set of centers as one of the function arguments will give me the same output clustering each time but I don't know how to implement it.

해결책

Instead of choosing k random cluster centers to start (I=randperm(N); M=X(I(1:K),:); in your code), simply pass cluster centers as input arguments to kmeans3:

function [M,j,e] = kmeans3(X,K,Max_Its,M)

Where M is k-by-D.

Also, I strongly suggest vectorizing your implementation with bsxfun. Please see my solution to the question "optimizing manually-coded k-means in MATLAB". Essentially, the inside of your for n=1:Max_Its loop would look something like:

% Calculate all high-dimensional distances at once
kdiffs = bsxfun(@minus,X,permute(M,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
Dm = sum(kdiffs.^2,2); % no need to do sqrt
Dm = squeeze(Dm); % Nx1xK => NxK

% Find closest cluster center for each point
[~,ik] = min(Dm,[],2); % Nx1

% Calculate the new cluster centers (mean the data)
M_new = zeros(k,D);
for i=1:k,
    indk = ik==i;
    clustersizes(i) = nnz(indk);
    M_new(i,:) = mean(X(indk,:))';
end

M = M_new; % update and iterate

Note that M_new has a row for each cluster, but if a cluster has no members, then that row will be NaNs.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow