How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?

StackOverflow https://stackoverflow.com/questions/6046918

  •  15-11-2019
  •  | 
  •  

Question

I have a stream of (uniform) random bits from which I'd like to generate random integers uniformly in the range [0,n] without wasting bits. (I'm considering bits wasted which are in excess of floor(log_2(n))+1, on the assumption that it's always possible to use no more than that.) E.g., if n = 5, then the algorithm I'm looking for should use no more than three bits. How can this be done?

Was it helpful?

Solution

This is equivalent to find a two-way function between two set of different (finite) cardinality. It is impossible.

OTHER TIPS

Although your question description specifies a fixed number of bits per random number generated your title does not. So I am going to add here that on average you can generate a random number with the number of bits you state plus half a bit. The algorithm below takes a variable number of bits for values of n not divisible by 2, but the average number bits it will consume is floor(log_2(n)) + 1.5.

Standard implementations of the function to generate an integer in a range use % (modulo) on a large random number. This wastes bits and will not produce a mathematically exact random distribution unless it is rerun for some values of the large random number. The following algorithm produces a true random distribution and will not waste bits. (Or rather I do not see an obvious way to reduce the number of bits it consumes. Maybe some entropy could be recovered from the 'number too large' occurences.)

# Generate a number from 0 to n inclusive without wasting bits.
function RandomInteger(n)
    if n <= 0
        error
    else
        i = Floor(Log2(n))
        x = i
        r = 0
        while x >= 0
            r = r + (2 ^ x) * NextRandomBit()
            if r > n 
                # Selected number too large so begin again.
                x = i 
                r = 0
            else
                # Still in range. Calculate the next bit.
                x = x - 1
        return r

The algorithm above is written for clarity not speed. It would be very fast if rewritten to process multiple bits at once.

It seems like you could just take x= ceil(log_2(n)) bits at a time, and just use these as your random numbers. The problem you'll encounter is that if the number you receive is greater than your limit (e.g. 5), then you'll want to perform some magic to get it less than 5, but uniformly. In this case, what seems logical is that you would just take another x bits, but since you've specified that we can't waste bits, then we're going to have to be more creative. I would recommend a right or left rotate, but this isn't always going to get you out of the situation. (Consider a string of 111 when you wanted n=5). We could do up to x rotates, to see if one of the rotates gets us into the correct range, or we could just flip all of the bits and add 1 (two's complement). I believe this will make it uniform.

So, for example, if you had the following string (rightmost bit is the first one you receive):

101001111010010101

And you're using n=5, then ceil(log2(n)) = 3, so you'll use three bits at a time, and the following will be your results (at each time step):

t=0 : 101 = 5
t=1: 010 = 2
t=2: 010 = 2
t=3: 111 = 7 -> too large, rotates won't work, so we use 2's complement: 001 = 1
t=4: 001 = 1
t=5: 101 = 5

First find out the number of possible values you want to generate. In case of integers in the range 0..5, that's 6 values. They can be represented in ceil( log(6)/log(2) ) bits.

// in C++
std::bitset< 3 > bits;
// fill the bitset

// interpret as a number
long value = bits.to_ulong();

Then find the transformation from n-bits to the final representation format: it needs to be scaled from the range [0..2N] to the range [from,to]:

double out_from=-1, out_to=5;
double in_from=0, in_to = std::bitset<3>().flip().to_ulong();

double factor   = (out_to-out_from)/(in_to-in_from)
double constant = out_from - in_from;

double rescaled = in_value * scale + constant;
long out = floor( rescaled );
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top