Question

I am using the following code to digitize an array into 16 bins:

numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])

I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?

Was it helpful?

Solution

This is actually documented behaviour of numpy.digitize():

Each index i returned is such that bins[i-1] <= x < bins[i] if bins is monotonically increasing, or bins[i-1] > x >= bins[i] if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate.

So in your case, 0 and 17 are also valid return values (note that the bin array returned by numpy.histogram() has length 17). The bins returned by numpy.histogram() cover the range array.min() to array.max(). The condition given in the docs shows that array.min() belongs to the first bin, while array.max() lies outside the last bin -- that's why 0 is not in the output, while 17 is.

OTHER TIPS

numpy.histogram() produces an array of the bin edges, of which there are (number of bins)+1.

In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open. Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)

x = np.array([1.2, 10.0, 12.4, 15.5, 20.])

bins = np.array([0,5,10,15,20])

np.digitize(x,bins,right=True)

array([1, 2, 3, 4, 4])

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top