Question

I am looking for efficient way to return top k elements from an input array.
One way would be to sort the array and return the k elements from end of array.

There are other methods suggested here, one of which uses the quickselect algorithm, but from my understanding, quickselect returns only the k-th element in an unsorted array. After it returns, elements to left and right of k are still unsorted.

So it should be something like this:

while k>0{
   quickselect(int[] arr, k);
   k--;
}

Quickselect is O(n) and we do that for k times, so the overall time complexity is O(n*k).
But the data in the post suggest that this is better than O(n log n).
Selecting the top 200 from a million sample would mean 200 million in the former case but 20 million in the latter. Clearly this is far better.

Is my understanding how quickselect can be employed to select the top 200 elements correct?

Was it helpful?

Solution

No, you don't need O(nk) time - it can be done in O(n) (average case).


The end result of the quickselect would be the k-th element at the k-th position from the end in the array, with smaller elements on the left, and larger elements on the right.

So all elements from that element to the right would be the k largest elements - it would be trivial to extract these in O(k) (with a simple for-loop), which would end up with a total running time of O(n).


Alternatively, since you'd know the k-th element after running quickselect, you could just go through the array again and extract all elements larger or equal to that element (this could be done in a single pass - also O(n)).


You'd need an additional O(k log k) (to sort them) if you want to return them in sorted order.

OTHER TIPS

Quickselect is good if n is not too large but if you have very large n (too large to fit in memory) or unknown n (say you are seeing an unbounded stream of samples and you want to be able to report the 200 largest seen so far at any point) then an alternative is to keep a k-element min-heap and every time you see a new element, compare it to the top of the min-heap (the smallest of the 200 largest elements so far) and if it is larger, pop the old top of the heap and push the new element onto the heap.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top