Finding the most frequent element assuming $\Theta(n)$ frequency

https://cs.stackexchange.com/questions/130135

29-09-2020
|

Question

We know [Ben-Or 1983] that deciding whether all elements in an array are distinct requires $\Theta(n \log(n))$ time; and this problem reduces to finding the most frequent element, so it takes $\Theta(n \log(n))$ time to find the most frequent element (assuming the domain of the array elements is not small).

But what happens when you know that there's an element with frequency at least $\alpha \cdot n$? Can you then decide the problem, or determine what the element is, in linear time (in $n$, not necessarily in $1/\alpha$) and deterministically?

La solution

Here is an algorithm for all $0<\alpha\leq 1$. I'm assuming your data can be ordered and that comparing two elements is done in constant time.

Run a few levels of the quick-sort recursion (choosing the pivot optimally in linear time with the Median of Medians algorithm) until you have partitioned the elements into "buckets" $B_1,\ldots, B_m$ each of size $\frac{\alpha n}{4} \leq |B_i| \leq \frac{\alpha n}{2}$, where all elements in $B_i$ are smaller or equal to all elements in $B_{i+1}$. This will take $O(n\log(1/\alpha))$ time.

Now notice that because the relative majority element $e$ is present at least $\alpha n$ times and each bucket has at most $\frac{\alpha n}{2}$ elements, the majority element needs to fill at least one of the buckets completely. Thus $e$ is also the first element in some bucket.

Notice also that there are at most $4/\alpha$ buckets as each bucket contains at least $\frac{\alpha n}{4}$ elements. Thus you can pick the first element in each bucket, and choose the element with maximum frequency among those in $O(n/\alpha)$ time.

Thus, you can find that relative majority element $e$ in $O(n\log(1/\alpha) + n/\alpha) = O(n/\alpha)$ time.

Autres conseils

Very partial answer: At least for $\alpha > 0.5$, yes.

$\text{candidate}$ <- (null value), $\text{count}$ <- 0
For each element $x$ in the array
1. If $x = \text{candidate}$ then
  1. increment $\text{count}$
2. else
  1. If $\text{count} = 0$
    1. $\text{candidate} \leftarrow x$, $\text{count} \leftarrow 1$
  2. else
    1. decrement $\text{count}$

The candidate remaining at the end of the array is the majority element. A potential-function argument can show this to be the case (I was taught this in a teaser for an online algorithms class).

This can be extended to $\alpha = 0.5$ by first finding two distinct elements of the array, then running the above on the array without one of them, then on the array without the other, then finally checking the frequency of the values you get from those two runs.

But - such a trick will probably not work for lower $alpha$ values.

Licencié sous: CC-BY-SA avec attribution

Non affilié à cs.stackexchange