digitize array to several lists of bins

https://stackoverflow.com/questions/21741378

10-10-2022
|

Question

I have the situation that I need to numpy.digitize an array. Say, the code is

my_bin_list = [3, 6, 9]
my_array = np.array([1,2,3,4,5,6,7,8,9])
digitized = numpy.digitize(my_array, my_bins)

This works just fine. However, the problem is that I do not have one list of bins like in the example, but exactly one bin list per element in my_array (because each element belongs to a different dataset that has its own bins), so that len(my_array) == len(list_of_my_bin_lists). Here is list_of_my_bin_lists = [my_bin_list1, my_bin_list2, ...]. So I need to tell digitize, that for the first array element it should check into which bin of list_of_my_bin_lists[0] that element belongs, for the second element with list_of_my_bin_lists[1] and so on. Is that possible? I would imagine something like

list_of_my_bin_lists = [[2, 6, 9], [4, 6, 8], [3, 5, 9]]
my_array = np.array([1, 3, 7])
digitized = numpy.digitize(my_array[i], my_bins[i] for i in len(my_array))

which would have to return for digitized: [0, 0, 2]

Solution

You can do this with a list comprehension close to what you were imagining:

import numpy as np

list_of_my_bin_lists = [[2, 6, 9], [4, 6, 8], [3, 5, 9]]
my_array = np.array([1, 3, 7])
digitized = [np.digitize(np.array([item]), bin_list)[0]
             for item, bin_list
             in zip(my_array, list_of_my_bin_lists)]

Result: digitized == [0, 0, 2]

This works assuming you want digitized to be a list of ints. If you want it to be an np.array or something, it should be straightforward to recast it to whatever you need.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow