Question

I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2]. The array sums up to 1.0.

Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn.

I came up with this:

def draw(probs):
    cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case
    return len(numpy.where(numpy.random.rand() >= cumsum)[0])

It works, but it's too convoluted, there must be a better way. Thanks.

Was it helpful?

Solution

import numpy as np
def random_pick(choices, probs):
    '''
    >>> a = ['Hit', 'Out']
    >>> b = [.3, .7]
    >>> random_pick(a,b)
    '''
    cutoffs = np.cumsum(probs)
    idx = cutoffs.searchsorted(np.random.uniform(0, cutoffs[-1]))
    return choices[idx]

How it works:

In [22]: import numpy as np
In [23]: probs = [0.1, 0.2, 0.5, 0.2]

Compute the cumulative sum:

In [24]: cutoffs = np.cumsum(probs)
In [25]: cutoffs
Out[25]: array([ 0.1,  0.3,  0.8,  1. ])

Compute a uniformly distributed random number in the half-open interval [0, cutoffs[-1]):

In [26]: np.random.uniform(0, cutoffs[-1])
Out[26]: 0.9723114393023948

Use searchsorted to find the index where the random number would be inserted into cutoffs:

In [27]: cutoffs.searchsorted(0.9723114393023948)
Out[27]: 3

Return choices[idx], where idx is that index.

OTHER TIPS

You want to sample from the categorical distribution, which is not implemented in numpy. However, the multinomial distribution is a generalization of the categorical distribution and can be used for that purpose.

>>> import numpy as np
>>> 
>>> def sampleCategory(p):
...     return np.flatnonzero( np.random.multinomial(1,p,1) )[0]
... 
>>> sampleCategory( [0.1,0.5,0.4] )
1

use numpy.random.multinomial - most efficient

I've never used numpy, but I assume my code below (python only) does the same thing as what you accomplished in one line. I'm putting it here just in case you want it.

Looks very c-ish so apologies for not being very pythonic.

weight_total would be 1 for you.

def draw(probs)
    r = random.randrange(weight_total)
    running_total = 0
    for i, p in enumerate(probs)
        running_total += p
        if running_total > r:
            return i

use bisect

import bisect
import random
import numpy 
def draw(probs):
    cumsum=numpy.cumsum(probs/sum(probs))
    return bisect.bisect_left(cumsum, numpy.random.rand())

should do the trick.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top