Select cells randomly from NumPy array - without replacement

https://stackoverflow.com/questions/3891180

28-09-2019
|

Question

I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).

I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?

Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.

Solution

How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?

If you need to change the array in-place than you can create an index array like this:

your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)

print your_array[index_array[:10]]

OTHER TIPS

All of these answers seemed a little convoluted to me.

I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.

The following code will do this in a simple and straight-forward manner:

#!/usr/bin/python
import numpy as np

#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))

#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))

#Shuffle the indices in-place
np.random.shuffle(indices)

#Access array elements using the indices to do cool stuff
for i in indices:
  d[i]=5

print d

Printing d verified that all elements have been accessed.

Note that the array can have any number of dimensions and that the dimensions can be of any size.

The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.

Extending the nice answer from @WoLpH

For a 2D array I think it will depend on what you want or need to know about the indices.

You could do something like this:

data = np.arange(25).reshape((5,5))

x, y  = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)

data = np.arange(25).reshape((5,5))

grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)

You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.

Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.

Use random.sample to generates ints in 0 .. A.size with no duplicates, then split them to index pairs:

import random
import numpy as np

def randint2_nodup( nsample, A ):
    """ uniform int pairs, no dups:
        r = randint2_nodup( nsample, A )
        A[r]
        for jk in zip(*r):
            ... A[jk]
    """
    assert A.ndim == 2
    sample = np.array( random.sample( xrange( A.size ), nsample ))  # nodup ints
    return sample // A.shape[1], sample % A.shape[1]  # pairs


if __name__ == "__main__":
    import sys

    nsample = 8
    ncol = 5
    exec "\n".join( sys.argv[1:] )  # run this.py N= ...
    A = np.arange( 0, 2*ncol ).reshape((2,ncol))

    r = randint2_nodup( nsample, A )
    print "r:", r
    print "A[r]:", A[r]
    for jk in zip(*r):
        print jk, A[jk]

Let's say you have an array of data points of size 8x3

data = np.arange(50,74).reshape(8,-1)

If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:

#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])

#iterate over it
for x,y in zip(*idxs): 
    #do something to data[x,y] here
    pass

Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.

flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
    #do something to flat_data[i] here
    pass

This will still permute the 2d "original" array as you'd like. To see this, try:

 flat_data[12] = 1000000
 print data[4,0]
 #returns 1000000

people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow