Analyzing a randomized algorithm for finding an approximate median of an array

https://cs.stackexchange.com/questions/124267

29-09-2020
|

Pergunta

I'm given an array $A$ = ($a_1, a_2, \cdots a_n$), where n is uneven. For an element $a_i$ we denote its position in the array with $p(a_i)$. This element would be an $ε$-approximate median of the array, if after we sort it, the following inequality holds:

$$\frac12 ((1 - ε) × n) < p(a_i) \leqslant \frac12 ((1 + ε) × n)$$ For example, the array $1,2,\cdots,9$ would have $4,5,6$ as $\dfrac13$-approximate medians.

My task is to analyze the following randomized algorithm, which finds an $ε$-approximate median of the array in constant time:

Choose $2k + 1$ elements of the array $A$: $b_1, b_2, \cdots, b_{2k+1}$, where each element is chosen uniformly randomly and independently of all the others (it is possible for an element to repeat). Using the algorithm for finding a median of an array in linear time (QuickSelect) output the median of the array the elements $b_1, b_2, \cdots, b_{2k+1}$ form.

I'm also given the following two random variables:

$K$: number of elements in $b_1, b_2, \cdots, b_{2k+1}$, which are smaller or equal to the $\dfrac12 ((1-ε)×n)$-biggest element in the original array $A$.

$G$: number of elements in $b_1, b_2, \cdots, b_{2k+1}$, which are bigger than the $\dfrac12 ((1+ε)×n)$-biggest element in the original array $A$.

What I have to do is find the best possible upper bounds for $$P(K \geqslant (1 + ε) E(K))$$ and $$P(G \geqslant (1 + ε) E(G)),$$ where $E(K)$ and $E(G)$ are the expected values for the random variables. I also have to find a bound for the probability the algorithm will be successful, which depends only on $k$, not on $n$, $E(K)$ or $E(G)$.

What I have done so far: I computed the expected values for the two random variables. I believe they are binomially distributed, so for example for $K$ I have $2k + 1$ events each with probability $\dfrac12(1-ε)$ to happen, so $$E(K) = (2k + 1) × \dfrac12 (1 - ε).$$ $E(G)$ turns out to be the same. Then I tried computing the two upper bounds, mentioned above with the Markov, Chebyshev and Chernoff inequalities:

Markov: $$P(K \geqslant (1 + ε) E(K)) \leqslant \frac1{1 + ε},$$ Chebyshev (might be false): $$P(|K - E(K)| \geqslant εE(K)) \leqslant \frac{\operatorname{Var}(K)}{ε^2 × (E(K))^2} = \frac{1}{ε(2k + 1)},$$ Chernoff: $$P(K \geqslant (1 + ε) E(K)) \leqslant \exp\left( -\frac13 ε^2 E(K) \right).$$ Are these correct? If they are, am I correct that Chebyshev is the best one? How do I continue with finding the probability of success of the algorithm?

Thank you :)

Solução

As suggested in the comments, the Chernoff bound is the best one because it gives better bounds as the input grows. We can again use this same Chernoff bound to determine a lower bound for the probability of success for the algorithm. This turns out to be $1 - 2 * \exp\left( -\frac13 ε^2 E(K) \right)$ as we have to subtract from $1$ the probabilities $P(G \geqslant (1 + ε) E(G))$ and $P(K \geqslant (1 + ε) E(K))$ because those are exactly the cases, where the algorithm fails to output the desired $ε$-approximate median.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange