Sorting is a good option for vectorization with numpy:
>>> s = np.argsort(b)
>>> s[np.searchsorted(b, a, sorter=s)]
array([0, 1, 2, 1, 1, 2, 1], dtype=int64)
If your array a
has m
elements and b
has n
, the sorting is going to be O(n log n), and the searching O(m log n), which is not bad. Dictionary based solutions should be amortized linear, but if the arrays are not huge the Python looping may make them slower than this. And broadcasting based solutions have quadratic complexity, they will only be faster for very small arrays.
Some timings with your sample:
In [3]: %%timeit
...: s = np.argsort(b)
...: np.take(s, np.searchsorted(b, a, sorter=s))
...:
100000 loops, best of 3: 4.16 µs per loop
In [5]: %%timeit
...: my_dict = dict(zip(b, range(len(b))))
...: np.vectorize(my_dict.get)(a)
...:
10000 loops, best of 3: 29.9 µs per loop
In [7]: %timeit (np.arange(b.size)*(a==b[:,newaxis]).T).sum(axis=-1)
100000 loops, best of 3: 18.5 µs per loop