Yep, iterating over numpy arrays as you're currently doing is relatively slow. Normally, you'd use slicing instead (which creates a view, rather than copying the data into a list).
It looks like you have an object array. This will make things even slower. Do you really need an object array? It looks like all of the values are int
s. (Is this a "vlen" hdf5 dataset?)
The use case where an object array would make sense is if you have a different number of items in each element of events
. If you don't, then there's no reason to use one.
If you were using a 2D array of ints instead of an object array of tuples, you'd just do:
field1 = events[:,0]
However, in that case, you could just do: (searchsorted
uses bisection)
index = np.searchsorted(events[:,0], val)
Edit
Ah! Okay, you have a structured array. In other words, it's an array (1D, in this case) where each item is a C-like struct. From:
>>> events.dtype
[('start', '<u8'),
('length', '<u4'),
('mean', '<f8'),
('variance', '<f8')]
...we can see that the first field is named "start".
Therefore, you just want:
index = np.searchsorted(events["start"], val)
In more general terms, if we didn't know the name of the field, but knew that it was a structured array of some sort, you'd do (paring things down to just the slicing step):
events[event.dtype.names[0]]
As far as whether or not it's a good idea to convert everything to a "normal" 2D array of ints, that depends on your use case. For basic slicing and calling searchsorted
, there's no reason to. There shouldn't (untested) be any significant speed increase.
Based on what you're doing at the moment, I'd just leave it as is.
However, structured arrays are often cumbersome to deal with.
There are plenty of cases where structured arrays are very useful (e.g. reading in certain binary formats from disk), but if you want to think of it as a "table-like" array, you'll quickly hit pain points. You're often better off storing the columns as separate arrays. (Or better yet, use a pandas.DataFrame
for "tabular" data.)
If you did want to convert it to a 2D array of ints, do:
events = np.hstack([events[name] for name in events.dtype.names])
This will automatically find a compatible datatype (int64
, in this case) for the new array and "stack" the fields of the structured array into columns in a 2D array.
Calling events = events.astype(int)
will effectively just yield the first column. (This is because each item of events is a C-like struct, and astype
operates element-wise, so each struct is converted to a single int.)