Question

I was reading the article on finding the k-th highest element in an array through the median-of-medians algorithm at ardendertat. In the portion explaining the complexity, the author seems to have discounted one factor, the cost of finding the median-of-medians for each partition recursively. Surely I can't partition all sub-arrays by the initial pivot, right? So won't this add to the complexity?

Was it helpful?

Solution

The position p of the median of medians after partitioning the array is between 0.3*n and 0.7*n.

So after one round, we have three possibilities:

  1. p == n-k, by pure luck, the first pivot is the k-th largest element, found in O(n) (median of medians and partitioning are both O(n)).
  2. p > n-k, then the k-th largest element is smaller than the first pivot, and we need to find the k - (n-p)-th largest element of the first part, which has at most 0.7*n elements, so the total cost for finding the k-th largest element is

    T(n) <= T(0.7*n) + C*n
    
  3. p < n-k, then we need to find the k-th largest element of the second part (after the pivot), that part is also at most 0.7*n elements large, so we again have the estimate

    T(n) <= T(0.7*n) + C*n
    

Iterating, we find

T(n) <= T((0.7)^k * n) + C*(1 + 0.7 + ... + (0.7)^(k-1))*n
     <= T(1) + C/(1 - 0.7)*n

which evidently is O(n).

OTHER TIPS

The median-of-medians algorithm was developed for the quick-select algorithm, which is very similar to quicksort but is actually linear instead of O(n log n) because it only recurses on one side of the partition. The select problem is to select the kth largest element in a set S; finding the median is the special case where k = (|S| + 1)/2.

The algorithm is straightforward:

  • select some pivot value, p.

  • partition the elements into two sets: S< (all the elements < p) and S (all the elements ≥ p).

  • recurse: if S< is at least k, then find the kth largest element in S<; otherwise, find the (k - |S<|)th largest element in S.

As with quicksort, the key is finding the pivot value. We're going to do it as follows:

  • construct Smedians consisting of the medians of each group of five elements of S.

  • find the exact median of Smedians by recursively calling select.

Now, |Smedians| is exactly 0.2 * |S|. Furthermore, once we have the pivot, we know that max(|S<|, |S) ≤ 0.7 * |S|. [fn 1] So the two recursive calls to select sum up to 0.9 * |S|.

So we can now demonstrate the the time to compute select is proportional to Σ0.9i n which is evidently linear in n.

I hope that was lucid enough for you.


In case it's not obvious, one-half of the medians in Smedians must be at least p (since p is their median), and for each group of five corresponding to those medians, three of the five elements (the median and the two larger elements) must be at least p. So that's 60% of 50% of the total elements in S, or 30%. A similar argument applies substituting "at most" for "at least", so we know that the smaller of the two subsets is at least 30% of the size of S, and consequently the larger one is at most 70% of the size.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top