Question

I can plot regression line with a set of x,y points in Matlab. But, if I have a cluster of points (like the below image), say I have four clusters of points, and I want to draw four regression lines for them.. how would I do that? All the points are saved in x,y. There's no way to separate them and put them into four different set of variables.

See the image below. Ignore the legends and labels. Any idea how can I do this in Matlab? If there's only one cluster, I can do it. But I want to do for all four clusters at once.enter image description here

Code I am using now for one cluster:

 %----------- Linear regression -----------------
 p= polyfit(x,y,1);
 f= polyval(p,x);
 %----------- Call R-square function ------------
 r2=Rsquare(x,y,p);


 %------------- Plot data -----------------------
 figure()
 plot(x,y,'*k');hold on
 plot(x,f,'-r'); % show linear fit
 xlabel('index');
 ylabel('Intensity a.u.');
 title('Test: Linear regreesion && R-square');
 %------- Show y-data on current figure ---------
 [row col]=size(y);
 for i=1:col
 str=num2str(y(i)); 
 text(x(i),y(i),str,'Color',[0 0 1]);
 end
 %--Show linear equation on current figure -------
 m1=num2str(p(1));c1=num2str(p(2));Rsquare1=num2str(r2(1));
 text(1.05,80,['y= ',m1,'x+',c1,' , R^2= ',Rsquare1,'.'],'FontSize',10,'FontName','Times New           Roman');
Was it helpful?

Solution

You'll have to separate your values into clusters. This is a non-trivial operation. This can be done via kmeans in the statistics toolbox, for instance:

%// First, I generate some example data in 4 clusters. 

%// intercepts
a = [4 7  0 -5];

%// slopes
b = [0.7 1.0 1.0 0.8];

%// ranges
xmin = [+1  -6  -6  +1];
xmax = [+6  -1  -1  +6];

%// generate clusters 
N = [30 40 25 33];
X = arrayfun(@(ii) (xmax(ii)-xmin(ii))*rand(N(ii),1) + xmin(ii), 1:4, 'UniformOutput', false);
Y = arrayfun(@(ii) a(ii) + b(ii)*X{ii} + randn(size(X{ii})), 1:4, 'UniformOutput', false);


%// Unfortunately, your points not are given in 4 separate clusters, but 
%// in a single array:
X = cat(1,X{:});
Y = cat(1,Y{:});

%// Therefore, you'll have to separate the data again into clusters: 
idx = kmeans([X,Y], 4, 'Replicates', 2);

X = {
    X(idx==1)
    X(idx==2)
    X(idx==3)
    X(idx==4)
};

Y = {
    Y(idx==1)
    Y(idx==2)
    Y(idx==3)
    Y(idx==4)
};


%// Now perform regression on each cluster
ab = arrayfun(@(ii) [ones(size(X{ii})) X{ii}]\Y{ii}, 1:4, 'UniformOutput', false);

%// the original values, and the computed ones
%// note that the order is not the same!
[a; b]
[ab{:}]

%// Plot everything for good measure
figure(1), clf, hold on

plot(...
    X{1}, Y{1}, 'g.',...
    X{2}, Y{2}, 'b.',...
    X{3}, Y{3}, 'r.',...
    X{4}, Y{4}, 'c.')

line([min(X{1}); max(X{1})], ab{1}(1) + ab{1}(2)*[min(X{1}); max(X{1})], 'color', 'k')
line([min(X{2}); max(X{2})], ab{2}(1) + ab{2}(2)*[min(X{2}); max(X{2})], 'color', 'k')
line([min(X{3}); max(X{3})], ab{3}(1) + ab{3}(2)*[min(X{3}); max(X{3})], 'color', 'k')
line([min(X{4}); max(X{4})], ab{4}(1) + ab{4}(2)*[min(X{4}); max(X{4})], 'color', 'k')

Results:

ans =
    4.0000    7.0000         0   -5.0000
    0.7000    1.0000    1.0000    0.8000
ans =
   -4.6503    6.4531    4.5433   -0.6326
    0.7561    0.8916    0.5914    0.7712

enter image description here

Taking into account the different order (looking at the colors in the plot), these results are indeed what you'd expect, given the large degree of noise I put on :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top