Most efficient algorithm to print 1-100 using a given random number generator
-
16-10-2019 - |
Pergunta
We are given a random number generator RandNum50
which generates a random integer uniformly in the range 1–50.
We may use only this random number generator to generate and print all integers from 1 to 100 in a random order. Every number must come exactly once, and the probability of any number occurring at any place must be equal.
What is the most efficient algorithm for this?
Solução
I thought (so it can be wrong :-) of this $O(N^2)$ solution that uses the Fisher-Yates shuffle. In order to keep uniform distribution with good approximation (see EDIT section below) at every iteration you can use this trick to produce a value krand
between $0$ and $k-1$:
// return a random number in [0..k-1] with uniform distribution
// using a uniform random generator in [1..50]
funtion krand(k) {
sum = 0
for i = 1 to k do sum = sum + RandNum50() - 1
krand = sum mod k
}
The Fisher-Yates algorithm becomes:
arr : array[0..99]
for i = 0 to 99 do arr[i] = i+1; // store 1..100 in the array
for i = 99 downto 1 {
r = krand(i+1) // random value in [0..i]
exchange the values of arr[i] and arr[r]
}
for i = 0 to 99 do print arr[i]
EDIT:
As pointed out by Erick the krand
function above doesn't return a truly uniform distribution. There are other methods that can be used to get a better (arbitrarily better) and faster approximation; but (up to my knowledge) the only way to get a truly uniform distribution is to use the rejection sampling: pick $m = \lceil \log_2(k) \rceil$ random bits and if the number $r$ obtained is less than $k$ return it, otherwise generate another random number; a possible implementation:
function trulyrand(k) {
if (k <= 1) return 0
while (true) { // ... if you're really unlucky ...
m = ceil(log_2 (k) ) // calculate m such that k < 2^m
r = 0 // will hold the random value
while (m >= 0) { // ... will add m bits
if ( rand50() > 25 ) then b = 1 else b = 0 // random bit
r = r * 2 + b // shift and add the random bit
m = m - 1
}
if (r < k) then return r // we have 0<=r<2^m ; accept it, if r < k
}
}
Outras dicas
Since other folks have given approximate solutions and solutions involving taking indeterminate numbers of deviates, how about a proof that there is no such algorithm that's guaranteed to only require a finite number of RandNum50()
calls?
As others have noted, printing the numbers from 1-100 in random order is equivalent to printing a random permutation of these numbers; there are 100! of these permutations, and so any particular permutation must be output with probability $\frac{1}{100!}$.
But if we knew that our algorithm used at most $k$ calls to RandNum50
for some $k$, then we could argue as follows: firstly, pad out those computation paths that make fewer than $k$ calls to RandNum50
to make additional dummy calls (that is, calls where the returned value is irrelevant), so that all computation paths make precisely $k$ calls. Any given sequence of $k$ results from our calls to RandNum50
must result in some output permutation, and so we can build an 'outcomes table' that maps any given sequence $(r_1, r_2, \ldots, r_k)$ of results from our calls into a particular output permutation. Since each of these outcomes is equally likely (each of them has probability $\displaystyle\frac{1}{50^k}$), then the probability of getting any particular permutation out of our algorithm must be of the form $\displaystyle\frac{c}{50^k}$ for some $c$. But $\displaystyle\frac{1}{100!}$ can't be of this form, because $100!$ doesn't divide $50^k$ for any $k$ (for instance, 3 divides $100!$ but can't divide any number of the form $50^k$). This means that no possible distribution of outcomes to random-number calls can produce a uniform permutation.
The previous solutions are not optimal. The complexity is exactly $n\log n + O(1)$ in calls to RandNum50 and is described in some detail here, using as a source of random bit (as suggested by Vor):
if ( rand50() > 25 ) then b = 1 else b = 0 // random bit
The basic idea is that you save a lot of bits if you generate a uniform between $1$ and $n!$, and then using factorial base decomposition, instead of generating a sequence of uniforms ranged up to $1$, then $2$, then $3$, etc., $n$. This is actually, as I mention in the post, the topic of a paper I have submitted!
If you do not know how to generate a uniform, as suggested in that post, from a random bit, you could also generate an approximation of the uniform directly, in this way (which is equivalent to Vor's "trulyrand", but faster):
P = (RandNum50()-1) + (RandNum50()-1)*50^1 + (RandNum50()-1)*50^2 + ...
going as far as you need to go. This is developing $P$ in base $50$. Then simply truncate $P$, i.e., $Q=P\mod n$, in your case $n=100!$. This value is not completely random, but it is a measure of uniformity that is often used. Or, as Vor suggests, you can reject if $P>n$. Then with this value, you can do the factorial base expansion as described in the post.
I haven't done the analysis to confirm how uniform (or not) this would be, and it could be adjusted to be a true shuffle, but could you just choose, from a starting array of the i
th index = i + 1
, the (k + RandNum50() + RandNum50() - 1) mod (100 - k)
index, with removal, for k
= 0..99?
This "pushes" the peak in the RandNum50() + RandNum50()
distribution forward uniformly.
I'm pretty sure this is not quite right as I've stated it because the 0 index (1) is not obtainable from the first choice and I cannot quickly see an alternative 1..50 + 1..50 adjustment that produces 0..99.
Update
To fix the issue I noted, I effectively used RandNum100
as mentioned in the question comments to randomally initialise the first k
offset.
This produces a distribution with a significant wave at the front.
Instead of advancing by 1 I used another RandNum50
to increment that first k
. This produces a result that is random enough for me, but it is still not "truly" random, as can be easily seen if you change K to 2.
Testing VB.NET code where I catered for any even K. Note it is O(K), 6K+2 in fact.