Question

I have two columns as follows.

ABC =

4.1103   25.5932
5.0852   31.2679
6.0021   15.9020
5.8495   21.4804
4.3245   19.9674
5.9378   38.3452
6.9460    8.8233
7.4568   44.7429
5.7358   32.7608
5.3510   35.2645
5.1657   54.6566
5.1381   44.1870
4.1566  101.8947
5.7310   -3.0565
5.5496   28.3637
4.5672   -1.7736
4.5805   11.8384
4.7948   33.7640
3.9901    6.0607
4.4203   17.7308
4.2712   -1.5834
4.8808   -2.3123
5.9004   -0.4623
5.3929    1.1477
5.6594    6.9741
5.5114   11.3982
5.4715    5.9189
5.0021    6.2561
4.1576   10.3207
6.1025    3.4654
3.9960    6.6892
5.6938    3.8429
5.2416    7.7513
7.0922    2.6871
5.3277   14.0617
6.1350    4.0316
6.0211  -20.3587
6.7399   14.0224
5.0818  102.6360
5.6444   24.3167
6.2542   19.8522
6.2862   24.3430
5.6452   -6.4020
5.4561   14.7813
4.7934    9.4639
3.8523   32.0766
3.9878    8.5313
4.5232   42.0309
4.2489  -12.0325
6.0413   -5.5464
4.9334   -3.2520
4.1349   20.9038
4.2329   20.6303
4.2009   31.8840
4.0624   48.5402
4.7674   28.6595
4.0767    4.7767
4.0971   34.8460
3.8442   24.0209
5.2471   38.8815
6.0241   59.3785
6.9743    6.5027
7.8732    4.5422
4.3094   68.4340
4.5601   -4.2946
4.6140  109.4510
4.5862   71.8387
5.2210   66.1310
4.3835   32.7592
6.1432   36.3832
5.4624   13.7891
5.2129   40.1301
3.8987   67.2705
6.6328   15.0286
8.0786   -7.3078
4.8968   -6.7754
4.1200    4.5333
4.1098   -3.3204
4.0373   26.4890
3.8467   48.8121
7.7795   -2.3606
6.9553   21.3609
6.2635   24.4985
6.1518   -1.4200
4.9115   11.5784
5.5908   13.1351
7.0117   -2.8297
5.2193   38.6937
6.0786   16.9453
6.8229   14.0907
8.0385   13.6228
8.6596   -1.4478
6.3257    8.0361
6.9223  -14.2179
3.8337   15.5773
4.0039  -24.1494
4.6332   17.9308
6.3684   11.3398
5.8592    4.0367
6.9040   12.1495
7.8524   -0.0432
8.3545   10.8865
9.3946   20.4614
4.3015   25.9674
4.4782   21.9045
4.1994   39.2286
4.3499   22.1004
4.3652   33.6220
4.2026   -5.8153
5.1330    6.4996
5.3118   33.7835
4.2002   -3.1917
3.8285   32.1016
3.9485   21.6358
3.8688   21.7830
4.0494   24.7914
4.0869   10.6577
4.6699    8.4756
5.1199   11.1885
5.1831    8.6163
4.5560    8.2806
4.4886    4.8017
4.5618    5.9434
4.1135   12.8942
4.1377   22.1423

I made equal no. of bins from 'x' and corresponding mean bin value 'yy'. as shown below

x=ABC(:,1);
y=ABC(:,2);
counter=1
    for i=min(x):0.3:max(x)     
         bin= x>i &  x<= i+0.3;       
         xbin(counter,1)  = mean(x(bin)); 
         yy(counter,1)    = mean(y(bin));
         counter          = counter+1
    end

plot(x,y,'ro'); hold on
plot(xbin,yy,'bo-'); 

Where a 'bin' is defined for certain range of 'x'(please see for loop).Now out put contains 'xbin' from 'x' and mean of data 'yy' from 'y' corresponding 'xbin'. I have concern about mean value 'yy' that it should be obtained from approx. equal no. of data point. If there are not sufficient data points of 'y' in 'bin' then the mean value 'yy' should be NaN. Please can someone help in this regard. Thanks

Was it helpful?

Solution

Check for the number of 1s in bin for each iteration of your for-loop. If that number is below a certain threshold, assign NaN to yy:

x=ABC(:,1);
y=ABC(:,2);
counter=1;

nbinmin = 5; % this is the threshold

for i=min(x):0.3:max(x)
    bin= x>i &  x<= i+0.3;
    xbin(counter,1)  = mean(x(bin));

    % check if the number of 1s in bin is less than the threshold
    if length(bin(bin==1)) < nbinmin
        yy(counter,1)    = NaN;
    else
        yy(counter,1)    = mean(y(bin));
    end
    counter = counter+1;
end

OTHER TIPS

The question isn't totally clear but have you tried using the histogram function, hist? It seems that it can do a lot of the work for you

% choose the bin locations
xcenters = min(x):0.3:max(x);

% compute counts in each bin
[counts, ctrs] = hist(y, xcenters);

% set any with too few samples to NaN
count_min = 3;
counts(counts < count_min) = NaN;

% plot -- either as a histogram, 
figure(1)
bar(ctrs, counts)
%or as a line plot (note that the line won't join up if too many NaN segments)
figure(2)
plot(ctrs, counts)

You are able to specify the input bin centres here, but to define the edges of the bins instead, look at histc.

You are basically looking for a histogram with non-uniform bins or a histogram with equal counts.

The simplest case for a non-uniform histogram is to sort the N values in x and separate the sorted vector into k bins, i.e. each bin will have N/k of the samples (you can also set the ratio by specifying N = ck).

Instead of a linear spacing the range domain x, you do a linear split of the ordered vector (thus a non-linear, non-uniform separation of the original range).

In your case it would look like this:

[sortedX, indeX] = sort(x);
nVals = length(x); % N
nBins = nVals/10;  % k = N/c

% linear split of the sorted vector
stepX = (1:nVals/nBins:nVals);
if stepX(end)~=nVals, stepX = [stepX nVals+1]; end

% counting and bining on the indexed vector
for i = 1 : length(stepX)-1    
    bin = indeX(stepX(i):stepX(i+1)-1);
    xbin(i,1) = mean(x(bin));
    yy(i,1) = mean(y(bin));  

end

To calculate the actual range (i.e. the edges of the histogram) you can use the midpoint between the max in bin i and the min in bin i+1. You can add something like the following in your loop:

% calculate the range
maxX(i) = max(x(bin));
minX(i) = min(x(bin));

The desired (non-linear) range is then:

rangeX = [min(x) maxX(1:end-1) + (minX(2:end) - maxX(1:end-1))/2 max(x)];

while your original (linear) range is:

rangeX_OP = min(x):0.3:max(x);

You can use histc to verify the equal counts (for rangeX) and non-equal counts (for rangeX_OP). This is how the counts would look (for random x in similar range to yours and c = 10 counts per bin). Top is the linear spacing if range, bottom is the non-linear.

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top