Most efficient algorithm to print 1-100 using a given random number generator

https://cs.stackexchange.com/questions/2576

16-10-2019
|

Pergunta

We are given a random number generator RandNum50 which generates a random integer uniformly in the range 1–50. We may use only this random number generator to generate and print all integers from 1 to 100 in a random order. Every number must come exactly once, and the probability of any number occurring at any place must be equal.

What is the most efficient algorithm for this?

Solução

I thought (so it can be wrong :-) of this $O(N^2)$ solution that uses the Fisher-Yates shuffle. In order to keep uniform distribution with good approximation (see EDIT section below) at every iteration you can use this trick to produce a value krand between $0$ and $k-1$:

 // return a random number in [0..k-1] with uniform distribution
 // using a uniform random generator in [1..50]
 funtion krand(k) {    
   sum = 0
   for i = 1 to k do sum = sum + RandNum50() - 1
   krand = sum mod k
 }

The Fisher-Yates algorithm becomes:

arr : array[0..99]
for i = 0  to 99 do arr[i] = i+1; // store 1..100 in the array
for i = 99 downto 1 {
  r = krand(i+1)  // random value in [0..i]
  exchange the values of arr[i] and arr[r]
}
for i = 0 to 99 do print arr[i]

EDIT:

As pointed out by Erick the krand function above doesn't return a truly uniform distribution. There are other methods that can be used to get a better (arbitrarily better) and faster approximation; but (up to my knowledge) the only way to get a truly uniform distribution is to use the rejection sampling: pick $m = \lceil \log_2(k) \rceil$ random bits and if the number $r$ obtained is less than $k$ return it, otherwise generate another random number; a possible implementation:

function trulyrand(k) {
    if (k <= 1) return 0
    while (true) { // ... if you're really unlucky ...
      m = ceil(log_2 (k) ) // calculate m such that k < 2^m
      r = 0  // will hold the random value
      while (m >= 0) {  // ... will add m bits        
        if ( rand50() > 25 ) then b = 1 else b = 0   // random bit
        r = r * 2 + b  // shift and add the random bit
        m = m - 1
      }      
      if (r < k) then return r  // we have 0<=r<2^m ; accept it, if r < k
    }
}

Outras dicas

Since other folks have given approximate solutions and solutions involving taking indeterminate numbers of deviates, how about a proof that there is no such algorithm that's guaranteed to only require a finite number of RandNum50() calls?

As others have noted, printing the numbers from 1-100 in random order is equivalent to printing a random permutation of these numbers; there are 100! of these permutations, and so any particular permutation must be output with probability $\frac{1}{100!}$.

But if we knew that our algorithm used at most $k$ calls to RandNum50 for some $k$, then we could argue as follows: firstly, pad out those computation paths that make fewer than $k$ calls to RandNum50 to make additional dummy calls (that is, calls where the returned value is irrelevant), so that all computation paths make precisely $k$ calls. Any given sequence of $k$ results from our calls to RandNum50 must result in some output permutation, and so we can build an 'outcomes table' that maps any given sequence $(r_1, r_2, \ldots, r_k)$ of results from our calls into a particular output permutation. Since each of these outcomes is equally likely (each of them has probability $\displaystyle\frac{1}{50^k}$), then the probability of getting any particular permutation out of our algorithm must be of the form $\displaystyle\frac{c}{50^k}$ for some $c$. But $\displaystyle\frac{1}{100!}$ can't be of this form, because $100!$ doesn't divide $50^k$ for any $k$ (for instance, 3 divides $100!$ but can't divide any number of the form $50^k$). This means that no possible distribution of outcomes to random-number calls can produce a uniform permutation.

The previous solutions are not optimal. The complexity is exactly $n\log n + O(1)$ in calls to RandNum50 and is described in some detail here, using as a source of random bit (as suggested by Vor):

if ( rand50() > 25 ) then b = 1 else b = 0   // random bit

The basic idea is that you save a lot of bits if you generate a uniform between $1$ and $n!$, and then using factorial base decomposition, instead of generating a sequence of uniforms ranged up to $1$, then $2$, then $3$, etc., $n$. This is actually, as I mention in the post, the topic of a paper I have submitted!

If you do not know how to generate a uniform, as suggested in that post, from a random bit, you could also generate an approximation of the uniform directly, in this way (which is equivalent to Vor's "trulyrand", but faster):

P = (RandNum50()-1) + (RandNum50()-1)*50^1 + (RandNum50()-1)*50^2 + ...

going as far as you need to go. This is developing $P$ in base $50$. Then simply truncate $P$, i.e., $Q=P\mod n$, in your case $n=100!$. This value is not completely random, but it is a measure of uniformity that is often used. Or, as Vor suggests, you can reject if $P>n$. Then with this value, you can do the factorial base expansion as described in the post.

I haven't done the analysis to confirm how uniform (or not) this would be, and it could be adjusted to be a true shuffle, but could you just choose, from a starting array of the ith index = i + 1, the (k + RandNum50() + RandNum50() - 1) mod (100 - k) index, with removal, for k = 0..99?

This "pushes" the peak in the RandNum50() + RandNum50() distribution forward uniformly.

_{I'm pretty sure this is not quite right as I've stated it because the 0 index (1) is not obtainable from the first choice and I cannot quickly see an alternative 1..50 + 1..50 adjustment that produces 0..99.}

Update

To fix the issue I noted, I effectively used RandNum100 as mentioned in the question comments to randomally initialise the first k offset.

This produces a distribution with a significant wave at the front.

Instead of advancing by 1 I used another RandNum50 to increment that first k. This produces a result that is random enough for me, but it is still not "truly" random, as can be easily seen if you change K to 2.

Testing VB.NET code where I catered for any even K. Note it is O(K), 6K+2 in fact.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange