Question

Because of the pigeon hole principle, you can't just use

output = min + (rand() % (int)(max - min + 1))

to generate an unbiased, uniform, result. This answer to a similar question provides one solution but it is highly wasteful in terms of random bits consumed.

For example, if the random range of the source is low, then the chances of having to generate a second value from the source can be quite high. Alternative, using a larger source range is inherently wasteful too.

While I'm sure an optimal source range size could be derived, that doesn't address the question that there may be a better algorithm, rather than optimizing this one.

[EDIT] The my idea has been shown in the answers to produce biased results.

An approach that occurs to me is

  1. Consume the minimum number of bits necessary to cover the desired range.
  2. If that value is outside the desired range only throw away one bit and consume one more.
  3. Repeat as necessary.
Was it helpful?

Solution

The common approach to eliminating bias is to throw away numbers that are outside of the desired range. As noted, this is wasteful. It's possible to minimize the waste by starting with a larger number of bits and generating multiple random numbers at the same time; you can achieve a better match between the range of inputs to outputs.

For example take a roll of a die. The output has 6 possibilities. The naive approach would take 3 random bits for each random number produced. The first example demonstrates the pigeon-hole problem.

def pigeon_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        yield 1 + bits * 6 // 8

1 : 832855
2 : 417835
3 : 416012
4 : 833888
5 : 416189
6 : 416554
total 3333333
max/min 2.00448063998

The second example is the wasteful approach commonly used. You can see that it generates fewer random number from the same number of random bits, but the bias is eliminated.

def wasteful_die(total_bit_count):
    for i in xrange(total_bit_count // 3):
        bits = random.getrandbits(3)
        if bits < 6:
            yield 1 + bits

1 : 417043
2 : 415812
3 : 417835
4 : 416012
5 : 416645
6 : 417243
total 2500590
max/min 1.00486517946

The final example takes 13 bits at a time and generates 5 random numbers from it. This generates even more numbers than the naive approach!

def optimized_die(total_bit_count):
    for i in xrange(total_bit_count // 13):
        bits = random.getrandbits(13)
        if bits < 6**5:
            for j in range(5):
                yield 1 + bits % 6
                bits //= 6

1 : 608776
2 : 608849
3 : 608387
4 : 608119
5 : 607855
6 : 608559
total 3650545
max/min 1.00163525841

The choice of 13 bits was made by taking the logarithm base 6 of powers of 2 and choosing the one that was closest to an integer.

def waste_list(n):
    for bit in range(1, 31):
        potential = math.log(2**bit, n)
        count = int(potential)
        if count > 0:
            waste = potential - count
            yield waste, bit, count

for waste, bit, count in sorted(waste_list(6)):
    print bit, count, waste
    if bit == 3:
        break

13 5 0.029086494049
26 10 0.0581729880981
8 3 0.0948224578763
21 8 0.123908951925
3 1 0.160558421704

As you can see, there are 4 choices better than the simple 3 bits.

OTHER TIPS

I am afraid that your suggested approach is biased.

Suppose the random number generator made numbers from 0 to 255, but you wanted a random number in the range 0 to 254.

If the random number generator produced 255 (1111_1111 in binary), your approach would throw away a single bit and add one more until eventually you would end up with the number 254 (1111_1110 in binary). (If I have understood your approach correctly.)

Therefore your generated numbers would have a probability of 1/256 for every number, except 254 which would have a probability of 1/128.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top