Question

I try to find the solution to this problem: How do you find all integers in a sorted array of size n that appear n/k times in less than O(klogn) time?

I could only find this question, where O(klogn) solution was provided.

Was it helpful?

Solution

Let's start with the case $k = 2$, and assume that the algorithm is comparison-based.

I claim that any algorithm that finds even a single element appearing $n/2$ times must know an index $i$ such that $A[i+1] = A[i+n/2]$. Indeed, suppose that this is not the case, and let us assume for simplicity that the entries of $A$ are real numbers. Let us say that the algorithm known the following values: $$ A[i_1] = \cdots = A[j_1] < A[i_2] =\cdots = A[j_2] < \cdots $$ It is consistent with the algorithm's knowledge that $A[i_1-1] < A[i_1] < A[j_1+1]$ and so on, and with a little effort we can arrange that there is no element appearing $n/2$ times.

Consider now the $n/2+1$ arrays of the following form: there is a run of $n/2$ many $0$s starting at some position $j \in \{1,\ldots,n/2+1\}$, and the rest of the elements are unique. In each such array, there is a unique choice for the index $i$ mentioned above. Thus the decision tree representing the algorithm has at least $n/2+1$ leaves, and so it must have depth $\Omega(\log n)$.


In the general case, the algorithm should know an index $i$ such that $A[i+1] = A[i+n/k]$ for each element which is output. We can partition the array into $k/2$ parts of size $2n/k$, and consider $(n/k+1)^k$ examples as before, where the $r$th element appearing $n/k$ times is in the $r$th part. This gives a lower bound of $\Omega(k\log(n/k))$, matching your upper bound for nearly all values of $k$.

(The algorithms can probably be slighly improved to give a matching upper bound.)

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top