Matlab:Issues in clustering

Question

The code you provide works perfectly well with slight modification for the 2D data set (two features) you provided.

Try it as follows:

data=[    0.1373   -1.8764
         -1.7020   -0.8322
          0.4862    0.8276
         -0.0078    1.3597
          0.9008    1.8043
          2.9751    0.7125
         -0.3257    0.1754];

numObservarations = length(data);
K=3

%% cluster

%opts = statset('MaxIter', 500, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = ...
     kmeans(data, K, 'MaxIter', 500, 'Display', 'iter', ...
            'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters

figure, hold on
scatter(data(:,1),data(:,2), 50, clustIDX, 'filled')
scatter(clusters(:,1),clusters(:,2), 200, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y')

This is the result:

enter image description here

Once again, the dataset you provided contains 2 features, so it is essentially 2D.

As far as I understand, kmeans clusters the data, it does not by itself perform dimensionality reduction (I await anyone else reading this to correct me). For dimensionality reduction what you really want to do is PCA or similar. Following PCA you can project your data onto the principal component axis and display the clusters in a "lower dimensional" way.

I don't actually understand what you mean by temporal ordering, but I if there is some correlation between temporal events and the features you can expect kmeans to classify (indirectly) according to those events.

Here's another example. This time the cluster size is 3. The centroids of the clusters are in variable clusters output above by kmeans.

enter image description here

The plot on the left shows the points in the 2D feature space colored according to time (the colorbar shows how the relative time relates to color). The middle figure shows what cluster points were assigned to according to a new color scale, same color scale as on the right plot which shows the position of the centroids. The point of the figure is to display the temporal regularity with which features show up.

With regard to your question about temporal ordering, it would appear that kmeans can uncover implicit temporal correlations in the features (if that's what you mean), as shown in the following plot of clustIDX versus time:

enter image description here

But I do not know how it compares to other processing algorithms (why it would be advantageous). I would head to dsp.stackexchange for a better answer.

The subplots were generated with the following code:

subplot(121);
scatter(data(:,2),data(:,3), 50, clustIDX, 'filled')
axis tight 
box on
xlabel('feature 1'), ylabel('feature 2')
title('labelled points')

subplot(122);
scatter(clusters(:,2),clusters(:,3), 200, (1:K)', 'filled')
axis tight
box on
xlabel('feature 1'),ylabel('feature 2')
title('clusters')

Second plot:

figure
scatter([1:length(clustIDX)],clustIDX, 50, clustIDX, 'filled')
xlabel('time'),ylabel('cluster')
box on
axis tight
title('labelled points in time domain')