Question

I am writing a function to clone the Histogram functionality of the Data Analysis Add-In in Excell. Basically, an input of sample data is provided, and then the bin ranges as well. The bin ranges must be monotonically increasing, and in my case, need to be specifically [0 20 40 60 80 100]. Excell calculates if a sample falls into the bin range if it is greater than the lower bound (left edge) and less than or equal to the upper bound (right edge).

I have written bin sorting algorithm below, and it gives improper output for data0 (very close), but proper output for data1 and data2. Proper in this case means that the output from this algorithm matches exactly the output in the table Excell generates where the number of samples is tallied next to the bin. Any help is appreciated!

#include <iostream>

int main(int argc, char **agv)
{
    const int SAMPLE_COUNT      = 21;
    const int BIN_COUNT         = 6;
    int binranges[BIN_COUNT]    = {0, 20, 40, 60, 80, 100};
    int bins[BIN_COUNT]         = {0, 0, 0, 0, 0, 0};

    int data0[SAMPLE_COUNT] =  {4,82,49,17,89,73,93,86,74,36,74,55,81,61,88,94,72,65,35,25,79};
    // for data0 excell's bins read:
    // 0    0
    // 20   2
    // 40   3
    // 60   2
    // 80   7
    // 100  7
    //
    // instead output of bins is: 203277

    int data1[SAMPLE_COUNT] = {88,83,0,0,95,86,0,94,92,77,94,73,93,90,50,95,93,83,0,95,91};
    //for data1 excell and this algorithm both yield:
    // 0    4
    // 20   0
    // 40   0
    // 60   1
    // 80   2
    // 100  14  (correct)

    int data2[SAMPLE_COUNT] = {58,48,75,68,85,78,74,83,83,75,67,58,75,58,84,68,57,88,55,79,72};
    //for data2 excell and this algorithm both yield:
    // 0    0
    // 20   0
    // 40   0
    // 60   6
    // 80   10
    // 100  5   (correct)

    for (unsigned int binNum = 1; binNum < BIN_COUNT; ++binNum)
    {
        const int leftEdge = binranges[binNum - 1];
        const int rightEdge = binranges[binNum];

        for (unsigned int sampleNum = 0; sampleNum < SAMPLE_COUNT; ++sampleNum)
        {
            const int sample = data0[sampleNum];

            if (binNum == 1)
            {
                if (sample >= leftEdge && sample <= rightEdge)
                    bins[binNum - 1]++;
            }
            else if (sample > leftEdge && sample <= rightEdge)
            {
                bins[binNum]++;
            }
        }
    }

    for (int i = 0; i < BIN_COUNT; ++i)
        std::cout << bins[i] << " " << std::flush;

    std::cout << std::endl << std::endl;

    return 0;
}
Was it helpful?

Solution

Assuming the edges are always in increasing order, all you need is:

     unsigned int bin;
    for (unsigned int sampleNum = 0; sampleNum < SAMPLE_COUNT; ++sampleNum)
    {
           const int sample = data0[sampleNum];
           bin = BIN_COUNT;
           for (unsigned int binNum = 0; binNum < BIN_COUNT; ++binNum)  {
                 const int rightEdge = binranges[binNum];
                 if (sample <= rightEdge) {
                    bin = binNum;
                    break;
                }
           }
           bins[bin]++;
      }

Although, for this code to work, you will need to add one more bin for the values which are equal or below the first edge (0).

The rational is that if you have n separators, then you have n+1 intervals.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top