How to generate an un-biased random number within an arbitrary range using the fewest bits

Question 1

The common approach to eliminating bias is to throw away numbers that are outside of the desired range. As noted, this is wasteful. It's possible to minimize the waste by starting with a larger number of bits and generating multiple random numbers at the same time; you can achieve a better match between the range of inputs to outputs.

For example take a roll of a die. The output has 6 possibilities. The naive approach would take 3 random bits for each random number produced. The first example demonstrates the pigeon-hole problem.

def pigeon_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        yield 1 + bits * 6 // 8

1 : 832855
2 : 417835
3 : 416012
4 : 833888
5 : 416189
6 : 416554
total 3333333
max/min 2.00448063998

The second example is the wasteful approach commonly used. You can see that it generates fewer random number from the same number of random bits, but the bias is eliminated.

def wasteful_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        if bits < 6:
            yield 1 + bits

1 : 417043
2 : 415812
3 : 417835
4 : 416012
5 : 416645
6 : 417243
total 2500590
max/min 1.00486517946

The final example takes 13 bits at a time and generates 5 random numbers from it. This generates even more numbers than the naive approach!

def optimized_die(total_bit_count):
    for i in xrange(total_bit_count // 13):
        bits = random.getrandbits(13)
        if bits < 6**5:
            for j in range(5):
                yield 1 + bits % 6
                bits //= 6

1 : 608776
2 : 608849
3 : 608387
4 : 608119
5 : 607855
6 : 608559
total 3650545
max/min 1.00163525841

The choice of 13 bits was made by taking the logarithm base 6 of powers of 2 and choosing the one that was closest to an integer.

def waste_list(n):
    for bit in range(1, 31):
        potential = math.log(2**bit, n)
        count = int(potential)
        if count > 0:
            waste = potential - count
            yield waste, bit, count

for waste, bit, count in sorted(waste_list(6)):
    print bit, count, waste
    if bit == 3:
        break

13 5 0.029086494049
26 10 0.0581729880981
8 3 0.0948224578763
21 8 0.123908951925
3 1 0.160558421704

As you can see, there are 4 choices better than the simple 3 bits.

Question 2

I am afraid that your suggested approach is biased.

Suppose the random number generator made numbers from 0 to 255, but you wanted a random number in the range 0 to 254.

If the random number generator produced 255 (1111_1111 in binary), your approach would throw away a single bit and add one more until eventually you would end up with the number 254 (1111_1110 in binary). (If I have understood your approach correctly.)

Therefore your generated numbers would have a probability of 1/256 for every number, except 254 which would have a probability of 1/128.