Question

Okay, I know Mergesort has a worst case time of theta(NlogN) but its overhead is high and manifests near the bottom of the recursion tree where the merges are made. Someone proposed that we stop the recursion once the size reaches K and switch to insertion sort at that point. I need to prove that the running time of this modified recurrence relation is theta(NK + Nlog(N/k))? I am blanking as to how to approach this problem..

Was it helpful?

Solution

Maybe a good start is to look at the recurrence relation for this problem. I imagine for typical mergesort it would look something like this:

T(N) = 2 * T(N / 2) + N

i.e. you are dividing the problem into 2 subproblems of half the size, and then performing N work (the merge). We have a base case that takes constant time.

Modelling this as a tree we have:

T(N)   =   N   ->   T(N / 2)   
               ->   T(N / 2)

       =   N   ->   (N / 2)   ->   T(N / 4)
                              ->   T(N / 4)
               ->   (N / 2)   ->   T(N / 4)
                              ->   T(N / 4)

This gives an expansion of

T(N) = N + 2N/2 + 4N/4 + ...
     = N + N + N ...

So really we just need to see how deep it goes. We know that the ith level operates on subproblems N / 2^i in size. So our leaf nodes (T(1)) occur on level L where N / 2^L = 1:

N / 2^L = 1
N = 2^L
log N = log(2^L)
log N = L

So our runtime is N log N.

Now we introduce insertion sort. Our tree will look something like this

T(N) = ... -> I(K)
           -> I(K)
             ...x N/K

In other words, we will at some level L have to solve N/K insertion sort problems of size K. Insertion sort has a worst-case runtime of K^2. So at the leaves we have this much work in total:

(N / K) * I(K)
= (N / K) * K * K
= N * K

But we also have a bunch of merging to do as well, at a cost of N per level of the tree as explained before. Going back to our previous method, let's find L (the number of levels before we reach subproblems of size K and thus switch to insertion):

N / 2^L = K
N / K = 2^L
L = log (N/K)

So in total we have

O(N) = N * K + N * log (N/K)

It's been too long since I took algorithms to give you a proof sketch, but that should get your neurons firing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top