Regression lines for cluster of points in Matlab

Question

You'll have to separate your values into clusters. This is a non-trivial operation. This can be done via kmeans in the statistics toolbox, for instance:

%// First, I generate some example data in 4 clusters. 

%// intercepts
a = [4 7  0 -5];

%// slopes
b = [0.7 1.0 1.0 0.8];

%// ranges
xmin = [+1  -6  -6  +1];
xmax = [+6  -1  -1  +6];

%// generate clusters 
N = [30 40 25 33];
X = arrayfun(@(ii) (xmax(ii)-xmin(ii))*rand(N(ii),1) + xmin(ii), 1:4, 'UniformOutput', false);
Y = arrayfun(@(ii) a(ii) + b(ii)*X{ii} + randn(size(X{ii})), 1:4, 'UniformOutput', false);


%// Unfortunately, your points not are given in 4 separate clusters, but 
%// in a single array:
X = cat(1,X{:});
Y = cat(1,Y{:});

%// Therefore, you'll have to separate the data again into clusters: 
idx = kmeans([X,Y], 4, 'Replicates', 2);

X = {
    X(idx==1)
    X(idx==2)
    X(idx==3)
    X(idx==4)
};

Y = {
    Y(idx==1)
    Y(idx==2)
    Y(idx==3)
    Y(idx==4)
};


%// Now perform regression on each cluster
ab = arrayfun(@(ii) [ones(size(X{ii})) X{ii}]\Y{ii}, 1:4, 'UniformOutput', false);

%// the original values, and the computed ones
%// note that the order is not the same!
[a; b]
[ab{:}]

%// Plot everything for good measure
figure(1), clf, hold on

plot(...
    X{1}, Y{1}, 'g.',...
    X{2}, Y{2}, 'b.',...
    X{3}, Y{3}, 'r.',...
    X{4}, Y{4}, 'c.')

line([min(X{1}); max(X{1})], ab{1}(1) + ab{1}(2)*[min(X{1}); max(X{1})], 'color', 'k')
line([min(X{2}); max(X{2})], ab{2}(1) + ab{2}(2)*[min(X{2}); max(X{2})], 'color', 'k')
line([min(X{3}); max(X{3})], ab{3}(1) + ab{3}(2)*[min(X{3}); max(X{3})], 'color', 'k')
line([min(X{4}); max(X{4})], ab{4}(1) + ab{4}(2)*[min(X{4}); max(X{4})], 'color', 'k')

Results:

ans =
    4.0000    7.0000         0   -5.0000
    0.7000    1.0000    1.0000    0.8000
ans =
   -4.6503    6.4531    4.5433   -0.6326
    0.7561    0.8916    0.5914    0.7712

enter image description here

Taking into account the different order (looking at the colors in the plot), these results are indeed what you'd expect, given the large degree of noise I put on :)