Question

Given a length n array of indices in 0 ... k-1 (i.e. A = [0, 0, 1, 2, 1, ...]), what is the most efficient way to form a new array of shape (n, k) B, such that B[i,j] = 1 if A[i] == j and A[i] = 0 otherwise?

i.e, for the example A = [0, 0, 1, 2, 1, ...] (k=3), we would get

B = [[1, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], ...]

Is there a way to do this without an explicit for loop?

Was it helpful?

Solution

Given the sparsity of the array that you build, you might want to use Scipy's sparse matrices, which have the advantage of having a small memory footprint:

import numpy
from scipy import sparse

A = numpy.array([0, 0, 1, 2, 1])
k = 3
B = sparse.coo_matrix((numpy.full(len(A), 1, dtype=int), (numpy.arange(len(A)), A)), shape=(len(A), k))

(coo_matrix() is described in Scipy's documentation). This gives the intended result:

>>> B.todense()
matrix([[ 1.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  1.,  0.]])

but with a small memory footprint (if k is large enough [larger than a few units]). In order to save even more memory, the dtype above could be made smaller (depending on your exact needs), with dtype=numpy.int8 or even dtype=bool.

OTHER TIPS

import numpy as np

A = np.array([0, 0, 1, 2, 1])

B = np.zeros((len(A), 3), dtype=np.int)

B[np.arange(len(A)), A] = 1

Result:

>>> B
array([[1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])
A=np.array([0, 0, 1, 2, 1])
n=5
k=3
B=np.zeros(n*k, 'int')
B[np.arange(n)*k+A]=1
B.reshape((n,k))

result:

array([[ 1,  0,  0],
       [ 1,  0,  0],
       [ 0,  1,  0],
       [ 0,  0,  1],
       [ 0,  1,  0]])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top