Question

I am doing some clustering using K-means in MATLAB. As you might know the usage is as below:

[IDX,C] = kmeans(X,k)

where IDX gives the cluster number for each data point in X, and C gives the centroids for each cluster.I need to get the index(row number in the actual data set X) of the closest datapoint to the centroid. Does anyone know how I can do that? Thanks

Was it helpful?

Solution

The "brute-force approach", as mentioned by @Dima would go as follows

%# loop through all clusters
for iCluster = 1:max(IDX)
    %# find the points that are part of the current cluster
    currentPointIdx = find(IDX==iCluster);
    %# find the index (among points in the cluster)
    %# of the point that has the smallest Euclidean distance from the centroid
    %# bsxfun subtracts coordinates, then you sum the squares of
    %# the distance vectors, then you take the minimum
    [~,minIdx] = min(sum(bsxfun(@minus,X(currentPointIdx,:),C(iCluster,:)).^2,2));
    %# store the index into X (among all the points)
    closestIdx(iCluster) = currentPointIdx(minIdx);
end

To get the coordinates of the point that is closest to the cluster center k, use

X(closestIdx(k),:)

OTHER TIPS

The brute force approach would be to run k-means, and then compare each data point in the cluster to the centroid, and find the one closest to it. This is easy to do in matlab.

On the other hand, you may want to try the k-medoids clustering algorithm, which gives you a data point as the "center" of each cluster. Here is a matlab implementation.

Actually, kmeans already gives you the answer, if I understand you right:

[IDX,C, ~, D] = kmeans(X,k); % D is the distance of each datapoint to each of  the clusters
[minD, indMinD] = min(D); % indMinD(i) is the index (in X) of closest point to the i-th centroid
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top