Random numbers with non-uniform discrete densities

https://stackoverflow.com/questions/7955917

17-02-2021
|

Frage

Just wondering what type of algorithm this is,
or if there is an easier/more efficient way to go about this:

Say we are given a certain probability density, say

prob[] = {.1, .15, .25, .05, .45}

Group 1 - 10%
Group 2 - 15%
Group 3 - 25%
Group 4 - 5%
Group 5 - 45%

and a random number, (0,1),
ran = .853234

Insert into one of the 5 groups

if (ran <=prob[0]) selection = 1;  
else if (ran <= prob[0]+prob[1]) selection = 2;  
...
else if (ran <= prob[0]+prob[1]+...+prob[4]) selection = 5;

I am not very well-versed on random number generation

Lösung

What you are essentially doing here is inverting the cumulative distribution function. Let F be the CDF of a random variable X with a given distribution, then it is defined as F(x) == P[X <= x].

The very useful thing here, is that if you generate a uniform random variable U between 0 and 1, then

P[F^-1(U) <= x] == P[U <= F(x)] == F(x) == P[X <= x]

which means that F^-1(U) will have the same distribution as X!

Of course this is only possible if you can invert the CDF, but in your case Fis a piecewise function (like a staircase), and your algorithm determines, for a given uniform value, at which step this value is met. Your algorithm is therefore perfectly correct.

However, you could improve it if you have a lot of random numbers to generate: first generate the CDF table, which in your case would be

CDF[] = {.1, .25, .5, .55, 1.}

then for each generated uniform number between 0 and 1, simply perform a dichotomy on the CDF table to retriver the corresponding index.

Andere Tipps

Your algorithm is correct. In your example though, the probabilities don't add up to 1.

That code will work except that you probabilities don't add up to 100% (so it is possible for none of the if-statements to match).

The approach can be simplified a bit by using a cumulative probability distribution:

cumprob[5] = {.1, .2, .45, .50, 1.0};

That also lets you substitute lsearch for the if-elif chain.

Your algorithm uses random floating point numbers for a discrete distribution which is not the best way to implement this. Your implementation may provide a distribution hardly distinguishable from the given distribution, but it is not scientifically correct.

Instead, find the lowest common denominator of your given probabilities (in your example 5%) and use a random integer in [0,19] to pick your group. Example:

switch(random(19)) {
case 0:
case 1:
  selection = 1;
  break;
case 2:
case 3:
case 4:
  selection = 2;
  break;
case 5:
case 6:
case 7:
case 8:
case 9:
  selection = 3;
  break;
case 10:
  selection = 4;
  break;
case 11:
case 12:
case 13:
case 14:
case 15:
case 16:
case 17:
case 18:
case 19:
  selection = 4;
  break;
}

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow