Pregunta

Given a length n array of indices in 0 ... k-1 (i.e. A = [0, 0, 1, 2, 1, ...]), what is the most efficient way to form a new array of shape (n, k) B, such that B[i,j] = 1 if A[i] == j and A[i] = 0 otherwise?

i.e, for the example A = [0, 0, 1, 2, 1, ...] (k=3), we would get

B = [[1, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], ...]

Is there a way to do this without an explicit for loop?

¿Fue útil?

Solución

Given the sparsity of the array that you build, you might want to use Scipy's sparse matrices, which have the advantage of having a small memory footprint:

import numpy
from scipy import sparse

A = numpy.array([0, 0, 1, 2, 1])
k = 3
B = sparse.coo_matrix((numpy.full(len(A), 1, dtype=int), (numpy.arange(len(A)), A)), shape=(len(A), k))

(coo_matrix() is described in Scipy's documentation). This gives the intended result:

>>> B.todense()
matrix([[ 1.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  1.,  0.]])

but with a small memory footprint (if k is large enough [larger than a few units]). In order to save even more memory, the dtype above could be made smaller (depending on your exact needs), with dtype=numpy.int8 or even dtype=bool.

Otros consejos

import numpy as np

A = np.array([0, 0, 1, 2, 1])

B = np.zeros((len(A), 3), dtype=np.int)

B[np.arange(len(A)), A] = 1

Result:

>>> B
array([[1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])
A=np.array([0, 0, 1, 2, 1])
n=5
k=3
B=np.zeros(n*k, 'int')
B[np.arange(n)*k+A]=1
B.reshape((n,k))

result:

array([[ 1,  0,  0],
       [ 1,  0,  0],
       [ 0,  1,  0],
       [ 0,  0,  1],
       [ 0,  1,  0]])
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top