Running mean of a data or Binning of a data

Question 1

Check for the number of 1s in bin for each iteration of your for-loop. If that number is below a certain threshold, assign NaN to yy:

x=ABC(:,1);
y=ABC(:,2);
counter=1;

nbinmin = 5; % this is the threshold

for i=min(x):0.3:max(x)
    bin= x>i &  x<= i+0.3;
    xbin(counter,1)  = mean(x(bin));

    % check if the number of 1s in bin is less than the threshold
    if length(bin(bin==1)) < nbinmin
        yy(counter,1)    = NaN;
    else
        yy(counter,1)    = mean(y(bin));
    end
    counter = counter+1;
end

Question 2

The question isn't totally clear but have you tried using the histogram function, hist? It seems that it can do a lot of the work for you

% choose the bin locations
xcenters = min(x):0.3:max(x);

% compute counts in each bin
[counts, ctrs] = hist(y, xcenters);

% set any with too few samples to NaN
count_min = 3;
counts(counts < count_min) = NaN;

% plot -- either as a histogram, 
figure(1)
bar(ctrs, counts)
%or as a line plot (note that the line won't join up if too many NaN segments)
figure(2)
plot(ctrs, counts)

You are able to specify the input bin centres here, but to define the edges of the bins instead, look at histc.

Question 3

You are basically looking for a histogram with non-uniform bins or a histogram with equal counts.

The simplest case for a non-uniform histogram is to sort the N values in x and separate the sorted vector into k bins, i.e. each bin will have N/k of the samples (you can also set the ratio by specifying N = ck).

Instead of a linear spacing the range domain x, you do a linear split of the ordered vector (thus a non-linear, non-uniform separation of the original range).

In your case it would look like this:

[sortedX, indeX] = sort(x);
nVals = length(x); % N
nBins = nVals/10;  % k = N/c

% linear split of the sorted vector
stepX = (1:nVals/nBins:nVals);
if stepX(end)~=nVals, stepX = [stepX nVals+1]; end

% counting and bining on the indexed vector
for i = 1 : length(stepX)-1    
    bin = indeX(stepX(i):stepX(i+1)-1);
    xbin(i,1) = mean(x(bin));
    yy(i,1) = mean(y(bin));  

end

To calculate the actual range (i.e. the edges of the histogram) you can use the midpoint between the max in bin i and the min in bin i+1. You can add something like the following in your loop:

% calculate the range
maxX(i) = max(x(bin));
minX(i) = min(x(bin));

The desired (non-linear) range is then:

rangeX = [min(x) maxX(1:end-1) + (minX(2:end) - maxX(1:end-1))/2 max(x)];

while your original (linear) range is:

rangeX_OP = min(x):0.3:max(x);

You can use histc to verify the equal counts (for rangeX) and non-equal counts (for rangeX_OP). This is how the counts would look (for random x in similar range to yours and c = 10 counts per bin). Top is the linear spacing if range, bottom is the non-linear.

enter image description here