문제

I'm using the Statistics Toolbox function kmeans in MATLAB for the first time. I want to get the total euclidian distance to nearest centroid as an indicator of optimal k. Here is my code :

clear all

N=10;

opts=statset('MaxIter',1000);

X=dlmread(['data.txt']);

crit=zeros(1,N);
for j=1:N
    [a,b,c]=kmeans(X,j,'Start','cluster','EmptyAction','drop','Options',opts);
        clear a b
        crit(j)=sum(c);
end

save(['crit_',VF,'_',num2str(i),'_limswvl1.mat'],'crit')

Well everything should go well except that I get this error for j = 6 :
X must have more rows than the number of clusters.

I do not understand the problem since X has 54 rows, and no NaNs.
I tried using different EmptyAction options but it still won't work.

Any idea ? :)

도움이 되었습니까?

해결책

The problem occurs since you use the cluster method to get initial centroids. From MATLAB documentation:

'cluster' - Perform preliminary clustering phase on random 10% subsample of X. This preliminary phase is itself initialized using 'sample'.

So when j=6, it tries to divide 10% of data into 6 clusters, i.e. 10% of 54 ~ 5. Therefore, you get the error X must have more rows than the number of clusters.

To get around this problem, either choose the points randomly (sample method) or choose points uniformly (uniform method).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top