Numpy Indexing of 2 Arrays

Question 1

Here is one option:

import numpy as np

a = np.array(['john', 'bill', 'greg', 'bill', 'bill', 'greg', 'bill'])
b = np.array(['john', 'bill', 'greg'])

my_dict = dict(zip(b, range(len(b))))

result = np.vectorize(my_dict.get)(a)

Result:

>>> result
array([0, 1, 2, 1, 1, 2, 1])

Question 2

Sorting is a good option for vectorization with numpy:

>>> s = np.argsort(b)
>>> s[np.searchsorted(b, a, sorter=s)]
array([0, 1, 2, 1, 1, 2, 1], dtype=int64)

If your array a has m elements and b has n, the sorting is going to be O(n log n), and the searching O(m log n), which is not bad. Dictionary based solutions should be amortized linear, but if the arrays are not huge the Python looping may make them slower than this. And broadcasting based solutions have quadratic complexity, they will only be faster for very small arrays.

Some timings with your sample:

In [3]: %%timeit
   ...: s = np.argsort(b)
   ...: np.take(s, np.searchsorted(b, a, sorter=s))
   ...: 
100000 loops, best of 3: 4.16 µs per loop

In [5]: %%timeit
   ...: my_dict = dict(zip(b, range(len(b))))
   ...: np.vectorize(my_dict.get)(a)
   ...: 
10000 loops, best of 3: 29.9 µs per loop

In [7]: %timeit (np.arange(b.size)*(a==b[:,newaxis]).T).sum(axis=-1)
100000 loops, best of 3: 18.5 µs per loop

Question 3

Create a dictionary for translating each string to number and then use numpy.vectorize for creating the output array

>>> import numpy as np
>>> a = np.array(['john', 'bill', 'greg', 'bill', 'bill', 'greg', 'bill'])
>>> b = np.array(['john', 'bill', 'greg'])
>>> d = {k:v for v, k in enumerate(b)}
>>> c = np.vectorize(d.get)(a)
>>> c
 array([0, 1, 2, 1, 1, 2, 1])

This is more efficient than looping and doing np.where(a == b[i]) because you only visit one element of the array once.

Question 4

Fully numpy solution:

(arange(b.size)*(a==b[:,newaxis]).T).sum(axis=-1)

Question 5

Another solution is possible by:

arr, bSorted, ind =  np.unique(a, return_index=True, return_inverse=True)
c = bSorted[ind]

If you wanted to get the unique elements out of a and do not care about the order in b, i.e. b and therefore c will look differently, then it can be simplified to

b, c = np.unique(a, return_inverse=True)

Question 6

Since the array b contains unique elements, equality with an element of a can only ever be with one single element of b. If all elements of a are definitely in b, then

import numpy as np
indices = np.where(a[:, np.newaxis] == b)[1]

will do the trick. If you are not sure whether all elements of a are in b, then

in_b, indices = np.where(a[:, np.newaxis] == b)

will collect all elements of a which are contained in b in in_b