Pergunta

Suppose we have a set x of N values {x_i; i=1,...,N} and a set of some associated probabilities {w_i; i=1,...,N}.

We want to get from the set x, a new set x^ of N values {x^_i; i=1,...,N} by choosing each value x_i from the setx according to the probability w_i. How do we code that (i.e. a pseudo code algorithm, that can be translated to any language).

EDIT: python code:

def resample(self,x,w):
    N = len(w)
    new_x = empty(N)
    c = cumsum(w)

    for i in range(N):
        r = random()
        for j in range(N):
            if( j == N-1 ):
                new_x[i] = x[j]
                break
            else:
                if( (c[j] <= r) and (r < c[j+1]) ):
                    new_x[i] = x[j+1]
                    break

    new_w = ones(N,dtype=float)/N
    return new_x, new_w
Foi útil?

Solução

You can call a function which gives you a random number between 0 and 1.
If the probabilities are w_1 = 0.2, w_2 = 0.5, w_3 = 0.3, you can:
Choose x_1 if you got a number between 0 and 0.2
Choose x_2 if you got a number between 0.2 and 0.7
Choose x_3 otherwise.

More generally, choose x_n if w_1 + ... + w_(n-1) <= random number < w_1 + ... + w_(n-1) + w_n

That's not the whole pseudocode, just an explanation of its most problematic part, but I think it should be enough if you have a basic understanding of your problem.

Outras dicas

I think the best option is preprocessing the probability set and then getting a random value.

Let me explain what I mean:

First you create a new set, for example h_i in which you place the accumulated probability of each object.

x_i:{A,B,C,D}
w_i:{0.2,0.3,0.4,0.1}
h_i:{0.2,0.5,0.9,1}

The last element is of course 1. (but if it is not (you have missing cases) it still works.

Now you generate a random number 0≤r≤1 and lookup the first element whose h is greater than r.

For example if you get 0.56 you choose C because 0.9(h_C) > 0.56 and 0.5(h_B) ≤ 0.56

This operation can be expensive on arrays but if you choose a binary search tree for the storage of the set h_i you can get very good results.

That is if you want to choose lots of random values over the same probability set. If the set is constantly changing I would use another approach.

# import the random library
import random

# define the input data
X = ["A","B","C","D"]
w = [0.2,0.3,0.4,0.1]

# size of the new sample
n = 10

# empy list to store the result
Xp = []

# the actual code
while len(Xp) < n:
    random_choice = random.choice(w)
    if random_choice >= random.random():
        Xp.append(X[w.index(random_choice)])

# have a look
Xp

Out[39]:

['C', 'C', 'C', 'B', 'D', 'B', 'A', 'D', 'A', 'B']

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top