Random numbers with non-uniform discrete densities

https://stackoverflow.com/questions/7955917

17-02-2021
|

문제

Just wondering what type of algorithm this is,
or if there is an easier/more efficient way to go about this:

Say we are given a certain probability density, say

prob[] = {.1, .15, .25, .05, .45}

Group 1 - 10%
Group 2 - 15%
Group 3 - 25%
Group 4 - 5%
Group 5 - 45%

and a random number, (0,1),
ran = .853234

Insert into one of the 5 groups

if (ran <=prob[0]) selection = 1;  
else if (ran <= prob[0]+prob[1]) selection = 2;  
...
else if (ran <= prob[0]+prob[1]+...+prob[4]) selection = 5;

I am not very well-versed on random number generation

해결책

What you are essentially doing here is inverting the cumulative distribution function. Let F be the CDF of a random variable X with a given distribution, then it is defined as F(x) == P[X <= x].

The very useful thing here, is that if you generate a uniform random variable U between 0 and 1, then

P[F^-1(U) <= x] == P[U <= F(x)] == F(x) == P[X <= x]

which means that F^-1(U) will have the same distribution as X!

Of course this is only possible if you can invert the CDF, but in your case Fis a piecewise function (like a staircase), and your algorithm determines, for a given uniform value, at which step this value is met. Your algorithm is therefore perfectly correct.

However, you could improve it if you have a lot of random numbers to generate: first generate the CDF table, which in your case would be

CDF[] = {.1, .25, .5, .55, 1.}

then for each generated uniform number between 0 and 1, simply perform a dichotomy on the CDF table to retriver the corresponding index.

다른 팁

Your algorithm is correct. In your example though, the probabilities don't add up to 1.

That code will work except that you probabilities don't add up to 100% (so it is possible for none of the if-statements to match).

The approach can be simplified a bit by using a cumulative probability distribution:

cumprob[5] = {.1, .2, .45, .50, 1.0};

That also lets you substitute lsearch for the if-elif chain.

Your algorithm uses random floating point numbers for a discrete distribution which is not the best way to implement this. Your implementation may provide a distribution hardly distinguishable from the given distribution, but it is not scientifically correct.

Instead, find the lowest common denominator of your given probabilities (in your example 5%) and use a random integer in [0,19] to pick your group. Example:

switch(random(19)) {
case 0:
case 1:
  selection = 1;
  break;
case 2:
case 3:
case 4:
  selection = 2;
  break;
case 5:
case 6:
case 7:
case 8:
case 9:
  selection = 3;
  break;
case 10:
  selection = 4;
  break;
case 11:
case 12:
case 13:
case 14:
case 15:
case 16:
case 17:
case 18:
case 19:
  selection = 4;
  break;
}

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow