Python splitting dict by value of one of the keys IndexError: index 141 is out of bounds for axis 0 with size 1

https://stackoverflow.com/questions/23233617

07-07-2023
|

Question

This question is an addendum to one that has already been asked: Splitting dict by value of one of the keys

I have a dictionary that has 19 keys and each key contains an array of 51000 values or observations. One of the keys is a grouping/classification key and its value can be either 1 or 2. What I would like to do is split the dictionary into two new dictionaries. One for when the classification key is 1 and another one for when the classification is 2.

data = {'variable 1': array([ 90, 91, 89, ...
           .
           .
           .
       'variable 18': array([0.1, 0.02, 0.4, ...
       'classifier': array([1, 1, 2, ...
       }

I have tried doing the solution posted by georgesl for the question mentioned above:

data1 = [ { key : data[key][idx] for key in data.keys() }  for idx, x in enumerate(data["id"]) if x == 1 ]

However, when I run this I get the following error:

 IndexError: index 141 is out of bounds for axis 0 with size 1

I also tried to convert the arrays to a list using:

data2 = {}
for key in data.keys():
     data[key] = data[key].tolist()

But this yields the following error when I run it through the posted solution:

IndexError: list index out of range

I am probably missing something really obvious but can't for the life of me figure out what. I am open for any suggestions.

Solution

I used something different, hope you don't mind. I believe it works:

from itertools import compress
data2={key:list(compress(data[key],[i-1 for i in data['classifier']])) for key in data.keys()}
data1={key:list(compress(data[key],[i-2 for i in data['classifier']])) for key in data.keys()}

It is my first time using itertools.compress so I am not an expert. Anyway, it works like a mask so something like:

>>> list(compress(['no','yes'],[False, True]))

gives:

['yes']

Also, if

data ['classifier'] = [1, 1, 2]

then

[i-1 for i in data['classifier']]

gives:

[0, 0, 1] #evaluates to [False,False,True]

and

[i-2 for i in data['classifier']]

gives:

[-1, -1, 0] #evaluates to [True,True,False]

Now, assuming you wanted 0 and 1 in classifier and if the classification key is 0 you have data1, this is your code:

data2={key:list(compress(data[key],[i for i in data['classifier']])) for key in data.keys()} # or just data['classifier']
data1={key:list(compress(data[key],[i + anything for i in data['classifier']])) for key in data.keys()}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow