Pergunta

It's well-known that this 'naive' algorithm for shuffling an array by swapping each item with another randomly-chosen one doesn't work correctly:

for (i=0..n-1)
  swap(A[i], A[random(n)]);

Specifically, since at each of $n$ iterations, one of $n$ choices is made (with uniform probability), there are $n^n$ possible 'paths' through the computation; because the number of possible permutations $n!$ doesn't divide evenly into the number of paths $n^n$, it's impossible for this algorithm to produce each of the $n!$ permutations with equal probability. (Instead, one should use the so-called Fischer-Yates shuffle, which essentially changes out the call to choose a random number from [0..n) with a call to choose a random number from [i..n); that's moot to my question, though.)

What I'm wondering is, how 'bad' can the naive shuffle be? More specifically, letting $P(n)$ be the set of all permutations and $C(\rho)$ be the number of paths through the naive algorithm that produce the resulting permutation $\rho\in P(n)$, what is the asymptotic behavior of the functions

$\qquad \displaystyle M(n) = \frac{n!}{n^n}\max_{\rho\in P(n)} C(\rho)$

and

$\qquad \displaystyle m(n) = \frac{n!}{n^n}\min_{\rho\in P(n)} C(\rho)$?

The leading factor is to 'normalize' these values: if the naive shuffle is 'asymptotically good' then

$\qquad \displaystyle \lim_{n\to\infty}M(n) = \lim_{n\to\infty}m(n) = 1$.

I suspect (based on some computer simulations I've seen) that the actual values are bounded away from 1, but is it even known if $\lim M(n)$ is finite, or if $\lim m(n)$ is bounded away from 0? What's known about the behavior of these quantities?

Foi útil?

Solução

We will show by induction that the permutation $\rho_n = (2,3,4,\ldots, n,1)$ is an example with $C(\rho_n) = 2^{n-1}$. If this is the worst case, as it is for the first few $n$ (see the notes for OEIS sequence A192053), then $m(n) \approx (2/e)^{n}$. So the normalized min, like the normalized max, is 'exponentially bad'.

The base case is easy. For the induction step, we need a lemma:

Lemma: In any path from $(2,3,4, \ldots, n, 1)$ to $(1,2,3, \ldots, n)$, either the first move swaps positions $1$ and $n$, or the last move swaps positions $1$ and $n$.

Proof Sketch: Suppose not. Consider the first move that involves the $n$'th position. Assume that it is the $i$'th move, $i\neq 1$ and $i \neq n$. This move must place the item $1$ in the $i$'th place. Now consider the next move that touches the item $1$. Assume this move is the $j$'th move. This move must swap $i$ and $j$, moving the item $1$ into the $j$'th place, with $i < j$. A similar argument says that the item $1$ can only subsequently be moved to the right. But the item $1$ needs to end up in the first place, a contradiction. $\square$

Now, if the first move swaps the positions $1$ and $n$, the remaining moves must take the permutation $(1, 3,4,5, \ldots, n,2)$ to $(1,2,3,4, \ldots, n)$. If the remaining moves don't touch the first position, then this is the permutation $\rho_{n-1}$ in positions $2 \ldots n$, and we know by induction that there are $C(\rho_{n-1})=2^{n-2}$ paths that do this. An argument similar to the proof of the Lemma says that there is no path that touches the first position, as the item $1$ must then end up in the incorrect position.

If the last move swaps the positions $1$ and $n$, the first $n-1$ moves must take the permutation $(2,3,4,\ldots, n,1)$ to the permutation $(n,2, 3,4, \ldots, n-1, 1)$. Again, if these moves don't touch the last position, then this is the permutation $\rho_{n-1}$, and by induction there are $C(\rho_{n-1})=2^{n-2}$ paths that do it. And again, if one of the first $n-1$ moves here touches the last position, the item $1$ can never end up in the correct place.

Thus, $C(\rho_n) = 2C(\rho_{n-1}) = 2^{n-1}$.

Outras dicas

After some digging around thanks to mhum's pointer to OEIS, I've finally found an excellent analysis and a nice (relatively) elementary argument (due, as far as I can tell, to Goldstein and Moews [1]) that $M(n)$ grows superexponentially fast in $n$:

Any involution $\iota$ of $\{1\ldots n\}$ corresponds to a run of the 'naive' shuffling algorithm that produces the identity permutation as its result, since the algorithm will swap $k$ with $\iota(k)$ and subsequently swap $\iota(k)$ with $k$, leaving both unchanged. This means that the number of runs of the algorithm that yield the identity permutation is at least the number of involutions $Q(n)$ (in fact, a little thinking shows that the correspondence is 1-1 and so it's exactly $Q(n)$), and so the maximum in $M(n)$ is bounded from below by $Q(n)$.

$Q(n)$ apparently goes by a number of names, including the telephone numbers : see http://oeis.org/A000085 and http://en.wikipedia.org/wiki/Telephone_number_%28mathematics%29 . The asymptotics are well-known, and it turns out that $Q(n) \approx C\left(\frac{n}{e}\right)^{n/2}e^\sqrt{n}$; from the recurrence relation $Q(n) = Q(n-1)+(n-1)Q(n-2)$ it can be inductively shown that the ratio $R(n) = \frac{Q(n)}{Q(n-1)}$ satisfies $\sqrt{n}\lt R(n)\lt\sqrt{n+1}$ and from there basic analysis gets the leading $n^{n/2}$ term in the asymptotics, though the other terms require a more careful effort. Since the 'scale factor' $\frac{n!}{n^n}$ in the definition of $M(n)$ is only about $C\sqrt{n}e^{-n}$, the leading term of $Q(n)$ dominates and yields (asymptotically) $M(n)\geq Cn^{(n+1)/2}e^{-3n/2+\sqrt{n}}$.

Goldstein and Moews in fact go on to show in [1] that the identity permutation is the most likely for large $n$, so the $\geq$ is in fact a $\approx$ and the behavior of $M(n)$ is fully settled. This still leaves the question of the behavior of $m(n)$ open; I wouldn't be too surprised if that also yielded to the analysis in their paper, but I haven't had opportunity to read it closely enough yet to really get a grip on their methods, only enough to grok the basic result.

[1] Goldstein, D. and Moews, D.: "The identity is the most likely exchange shuffle for large n", http://arxiv.org/abs/math/0010066

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top