Question

say I have a list:

a = [3, 5, 1, 1, 3, 2, 4, 1, 6, 4, 8]

and a sub list of a:

b = [5, 2, 6, 8]

I'd like to obtain bins by pd.qcut(a,2) and count number of values in each bin for list b. That is

In[84]: pd.qcut(a,2)
Out[84]: 
Categorical: 
[[1, 3], (3, 8], [1, 3], [1, 3], [1, 3], [1, 3], (3, 8], [1, 3], (3, 8], (3, 8], (3, 8]]
Levels (2): Index(['[1, 3]', '(3, 8]'], dtype=object)

Now I know the bins are: [1,3] and (3,8], and I'd like to know how many values in each bin for list "b". I can do this by hand when the number of bins is small, but what's the best approach when the number of bins is large?

No correct solution

OTHER TIPS

You can use retbins paramether to get bins back from qcut:

>>> q, bins = pd.qcut(a, 2, retbins=True)

Then use pd.cut to get b indices with respect to bins:

>>> b = np.array(b)
>>> hist = pd.cut(b, bins, right=True).labels
>>> hist[b==bins[0]] = 0
>>> hist
array([1, 0, 1, 1])

Note that you have to treat corner case, bins[0], separately, as it is not included by cut in leftmost bin.

As shown in earlier answer: You can get the bin boundaries from qcut using the retbins parameter, as in the following:

q, bins = pd.qcut(a, 2, retbins=True)

Then you can use cut to put values from another list into those "bins." For example:

myList = np.random.random(100)
# Define bin bounds that cover the range returned by random()
bins = [0, .1, .9, 1] 
# Now we can get the "bin number" of each value in myList:
binNum = pd.cut(myList, bins, labels=False, include_lowest=True)
# And then we can count the number of values in each bin number:
np.bincount(binNum)

Be sure that your bin bounds cover the entire range of values that appear in your second list. To ensure this you might pad your bin boundaries with the max and min value. E.g.,

cutBins = [float('-inf')] + bins.tolist() + [float('inf')]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top