Create a new array from numpy array based on the conditions from a list

https://stackoverflow.com/questions/3607001

25-09-2019
|

Question

Suppose that I have an array defined by:

data = np.array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v1', 'a3v1', 'a4v2', 'a5v1'),
       ('a1v3', 'a2v1', 'a3v1', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v1', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v3', 'a3v2', 'a4v2', 'a5v1'),
       ('a1v3', 'a2v3', 'a3v2', 'a4v2', 'a5v2'),
       ('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v1', 'a2v2', 'a3v2', 'a4v2', 'a5v2'),
       ('a1v3', 'a2v2', 'a3v1', 'a4v2', 'a5v2'),
       ('a1v3', 'a2v1', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v1', 'a4v2', 'a5v1')],
      dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'),
             ('a4', '|S4'), ('a5', '|S4')])

How to create a function to list out data elements by row with conditions given in a list of tuples, r.

r = [('a1', 'a1v1'), ('a4', 'a4v1')]

I know that it can be done manually like this:

data[(data['a1']=='a1v1') & data['a4']=='a4v1']

What about removing rows from data that comply with the r.

data[(data['a1']!='a1v1') | data['a4']!='a4v1']

Thanks.

Solution

If I'm understanding you correctly, you want to list the entire row, where a given tuple of columns is equal to some value. In that case, this should be what you want, though it's a bit verbose and obscure:

test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)
data[test_cols == test_vals]

Note the "nested list" style indexing... That's the easiest way to select multiple columns of a structured array. E.g.

data[['a1', 'a4']]

will yield

array([('a1v1', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v1'),
       ('a1v2', 'a4v1'), ('a1v2', 'a4v1'), ('a1v2', 'a4v2'),
       ('a1v3', 'a4v2'), ('a1v1', 'a4v1'), ('a1v1', 'a4v1'),
       ('a1v2', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v2'),
       ('a1v3', 'a4v1'), ('a1v2', 'a4v2')], 
      dtype=[('a1', '|S4'), ('a4', '|S4')])

You can then test this agains a tuple of the values that you're checking for and get a one-dimensional boolean array where those columns are equal to those values.

However, with structured arrays, the dtype has to be an exact match. E.g. data[['a1', 'a4']] == ('a1v1', 'a4v1') just yields False, so we have to make an array of the values we want to test using the same dtype as the columns we're testing against. Thus, we have to do something like:

test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)

before we can do this:

data[test_cols == test_vals]

Which yields what we were originally after:

array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2')], 
      dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'), ('a4', '|S4'), ('a5', '|S4')])

Hope that makes some sense, anyway...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow