Uniformity of random numbers taken modulo N

https://stackoverflow.com/questions/13104478

14-07-2021
|

Question

One common way of choosing a random number in [0, n) is to take the result of rand() modulo n: rand() % n. However, even if the results returned by the available rand() implementation are fully uniform, shouldn't there be a problem with the uniformity of the resulting [0, n) numbers when RAND_MAX + 1 does not divide evenly by n? E.g. suppose RAND_MAX is 2, and n is 2. Then out of 3 possible rand() outputs: 0, 1 and 2, we get 0, 1 and 0 respectively when we use them modulo n. Therefore the output will not be uniform at all.

Is this a real problem in practice? What is a better way of choosing random numbers in [0, n) uniformly deriving from rand() output, preferably without any floating point arithmetic?

Solution

You are correct, rand() % N is not precisely uniformly distributed. Precisely how much that matters depends on the range of numbers you want and the degree of randomness you want, but if you want enough randomness that you'd even care about it you don't want to use rand() anyway. Get a real random number generator.

That said, to get a real random distribution, mod to the next power of 2 and sample until you get one in the range you want (e.g. for 0-9, use while(n = rand()%0x10 > 10);).

OTHER TIPS

That depends on:

The value of RAND_MAX
Your value of N

Let us assume your RAND_MAX is 2^32. If N is rather small (let's say 2) then the bias is 1 / 2^31 -- or too small to notice.

But if N is quite a bit larger, say 2^20, then the bias is 1 / 2^12, or about 1 in 4096. A lot bigger, but still pretty small.

One approach you can do is the following:

Knowing the value of N, you make R_MAX = ((RAND_MAX + 1) / N) * N; for uniformity.

So you can do your custom rand() function:

int custom_rand(int mod) {
    int x = rand();
    const int R_MAX = ((RAND_MAX + 1) / mod) * mod;    

    while (x > R_MAX) { // discard the result if it is bigger
        x = rand();
    }

    return (x % mod);
}

There are two problems with using remainder (% is not a "modulo" operator in C) to a uniform random number over a reduced range. First is that there is a slight bias toward smaller numbers (mentioned above) and second that typical PRNGs tend to be less random in the low order bits. I seem to recall that from Knuth (The Art of Computer Programming, Vol II, Seminumerical Algorithms) along with the claim that (after translating from MIX to C) rand()%2 is a poor source of random single bits. It's better to pick (rand() > RAND_MAX/2) (or test a high-order bit, if RAND_MAX is nearly a power of 2.)

The remainder should be good enough casual use on small intervals. Avoid it for simulations. Actually, avoid rand() altogether for large simulations or "Monte Carlo" computations. Implementations tend to have a period on the order of 2^32 or less. It's not hard to exceed 4 billion trials on a 2+ GHz processor.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow