Analyzing a randomized algorithm for finding an approximate median of an array

https://cs.stackexchange.com/questions/124267

29-09-2020
|

質問

I'm given an array $A$ = ($a_1, a_2, \cdots a_n$), where n is uneven. For an element $a_i$ we denote its position in the array with $p(a_i)$. This element would be an $ε$-approximate median of the array, if after we sort it, the following inequality holds:

$$\frac12 ((1 - ε) × n) < p(a_i) \leqslant \frac12 ((1 + ε) × n)$$ For example, the array $1,2,\cdots,9$ would have $4,5,6$ as $\dfrac13$-approximate medians.

My task is to analyze the following randomized algorithm, which finds an $ε$-approximate median of the array in constant time:

Choose $2k + 1$ elements of the array $A$: $b_1, b_2, \cdots, b_{2k+1}$, where each element is chosen uniformly randomly and independently of all the others (it is possible for an element to repeat). Using the algorithm for finding a median of an array in linear time (QuickSelect) output the median of the array the elements $b_1, b_2, \cdots, b_{2k+1}$ form.

I'm also given the following two random variables:

$K$: number of elements in $b_1, b_2, \cdots, b_{2k+1}$, which are smaller or equal to the $\dfrac12 ((1-ε)×n)$-biggest element in the original array $A$.

$G$: number of elements in $b_1, b_2, \cdots, b_{2k+1}$, which are bigger than the $\dfrac12 ((1+ε)×n)$-biggest element in the original array $A$.

What I have to do is find the best possible upper bounds for $$P(K \geqslant (1 + ε) E(K))$$ and $$P(G \geqslant (1 + ε) E(G)),$$ where $E(K)$ and $E(G)$ are the expected values for the random variables. I also have to find a bound for the probability the algorithm will be successful, which depends only on $k$, not on $n$, $E(K)$ or $E(G)$.

What I have done so far: I computed the expected values for the two random variables. I believe they are binomially distributed, so for example for $K$ I have $2k + 1$ events each with probability $\dfrac12(1-ε)$ to happen, so $$E(K) = (2k + 1) × \dfrac12 (1 - ε).$$ $E(G)$ turns out to be the same. Then I tried computing the two upper bounds, mentioned above with the Markov, Chebyshev and Chernoff inequalities:

Markov: $$P(K \geqslant (1 + ε) E(K)) \leqslant \frac1{1 + ε},$$ Chebyshev (might be false): $$P(|K - E(K)| \geqslant εE(K)) \leqslant \frac{\operatorname{Var}(K)}{ε^2 × (E(K))^2} = \frac{1}{ε(2k + 1)},$$ Chernoff: $$P(K \geqslant (1 + ε) E(K)) \leqslant \exp\left( -\frac13 ε^2 E(K) \right).$$ Are these correct? If they are, am I correct that Chebyshev is the best one? How do I continue with finding the probability of success of the algorithm?

Thank you :)

解決

As suggested in the comments, the Chernoff bound is the best one because it gives better bounds as the input grows. We can again use this same Chernoff bound to determine a lower bound for the probability of success for the algorithm. This turns out to be $1 - 2 * \exp\left( -\frac13 ε^2 E(K) \right)$ as we have to subtract from $1$ the probabilities $P(G \geqslant (1 + ε) E(G))$ and $P(K \geqslant (1 + ε) E(K))$ because those are exactly the cases, where the algorithm fails to output the desired $ε$-approximate median.

ライセンス： CC-BY-SA と帰属

所属していません cs.stackexchange