Worst-case expected running time for Randomized Permutation Algorithm

https://cs.stackexchange.com/questions/123544

29-09-2020
|

Pergunta

I have an algorithm which, when given a positive integer N, generates a permutation of the first N integers (from 1 to N) using a method called randInt(x,y). The method randInt(x,y) will generate a random integer between the numbers x and y, provided they are positive integers and y >= x.

The algorithm is given by the following pseudo-code:

1.  if (N <= 0) {
2.     return null
3.  } else {
4.     A := new int[] w/ size N and all cells initialized to 0
5.     a[0] := randInt(1,N)
6.     for (i := 1 to length(A)-1) do 
7.        boolean rInA := True
8.        while (rInA) {
9.           rInA := False 
10.          int r := randInt(1,N)
11.          for (j := 0 to (i-1)) do 
12.             if (r = A[j]) {
13.                rInA := True
14.             }
15.          }   
16.       }
17.       A[i] := r
18.    }
19. }
20. return A

My understanding of the algorithm is as follows:

The outermost for-loop will run N-1 times and for each of those iterations a random number is generated and then compared to all the previous cells of A that have been visited in previous iterations. If any of the those cells contain that randomly generated number then that number cannot be used and a new number is randomly generated (in the next iteration of that nested while-loop). This new randomly generated number is then, like before, compared to all the previously visited cells in A to check for duplication. This continues until randInt(x,y) generates a random number that is not already in the first i cells of A.

This leads me to believe that the Worst-case expected running time of the algorithm is something like: $\sum_{i=1}^{N-1}(\alpha i)$

Now the $\alpha$ here represents the effect the while-loop has on the running time and is the point of uncertainty for me. I know that in the first iteration of the outermost for loop its unlikely that randInt will generate the one integer that A already contains (1/N I believe) so that inner-most for-loop is likely to only execute once. However, by the last iteration (of outer-most for-loop) the probability that randInt generates one of the N-1 integers already in A is $\frac{N-1}{N}$ so because of the while-loop its likely that the inner-most for-loop for that iteration (of the outer-most for-loop) will execute more like n times.

How can I use the probability introduced into the algorithm by randInt to calculate the algorithms run-time?

Solução

Firstly I would revise the inner for loop so that checking whether $r$ has been used already, is $O(1)$. As stated, it is $O(n)$. You could do this by initializing a (1-indexed) boolean array $used[\cdot]$ of length $n$, and setting $used[x]$ equals true whenever you set some $A[i]=x$.

Now the question is how many times may $rand()$ be called in the worst case. Actually, the way the algorithm is set up right now, the worst case is $+\infty$; this is because it's not learning from any of its bad choices of $r$. For instance, if it selects $5$, when $5$ was already in $A$, then the smart thing to do would be never to guess $5$ again. There are ways to achieve this; so if you had some method that never repeated guesses, then you can get $O(n^2)$ worst case runtime.

If you are interested in expected runtime, then you can calculate the expected number of times $r$ is recalculated in any given step: at step $i$, there are $n-i$ good choices, for a success probability of $\frac{n-i}{n}$. The expectation for number of tries to get a first success in a bernoulli variable with probability $p$, is $\frac{1}{p}$. Then if you sum $\frac{n}{n}+\frac{n}{n-1}+\frac{n}{n-2}+\cdots \frac{n}{1}$ you have asymptotic $O(n\log n)$. This is basically the coupon collector's problem.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange