Sharp concentration for selection via random partitioning?

https://cs.stackexchange.com/questions/1346

16-10-2019
|

Pergunta

The usual simple algorithm for finding the median element in an array $A$ of $n$ numbers is:

Sample $n^{3/4}$ elements from $A$ with replacement into $B$
Sort $B$ and find the rank $|B|\pm \sqrt{n}$ elements $l$ and $r$ of $B$
Check that $l$ and $r$ are on opposite sides of the median of $A$ and that there are at most $C\sqrt{n}$ elements in $A$ between $l$ and $r$ for some appropriate constant $C > 0$. Fail if this doesn't happen.
Otherwise, find the median by sorting the elements of $A$ between $l$ and $r$

It's not hard to see that this runs in linear time and that it succeeds with high probability. (All the bad events are large deviations away from the expectation of a binomial.)

An alternate algorithm for the same problem, which is more natural to teach to students who have seen quick sort is the one described here: Randomized Selection

It is also easy to see that this one has linear expected running time: say that a "round" is a sequence of recursive calls that ends when one gives a 1/4-3/4 split, and then observe that the expected length of a round is at most 2. (In the first draw of a round, the probability of getting a good split is 1/2 and then after actually increases, as the algorithm was described so round length is dominated by a geometric random variable.)

So now the question:

Is it possible to show that randomized selection runs in linear time with high probability?

We have $O(\log n)$ rounds, and each round has length at least $k$ with probability at most $2^{-k+1}$, so a union bound gives that the running time is $O(n\log\log n)$ with probability $1-1/O(\log n)$.

This is kind of unsatisfying, but is it actually the truth?

Solução

It's not true that the algorithm runs in linear time with high probability. Considering only the first round, the running time is at least $\Theta(n)$ times a $G(1/2)$ random variable. Let $p(n) \longrightarrow 0$ be the allowed failure probability. Since $\Pr[G(1/2) \geq \log_2 p(n)^{-1}] = p(n)$, the running time is at least $\Omega(n \log_2 p(n)^{-1}) = \omega(n)$.

(There is some cheating involved, since the length of the first round isn't really $G(1/2)$. More careful analysis might or might not validate this answer.)

Edit: Grübel and Rosler proved that the expected number of comparisons divided by $n$ tends (in some sense) to some limit distribution, which is unbounded. See for example Grübel's paper "Hoare's selection algorithm: a Markov chain approach", which references their original paper.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange