Question

I am self learning CLRS 3rd edition and here is one of the tougher questions that I have encountered along with its answer as a service to all.

7.4-5 We can improve the running time of quicksort in practice by taking advantage of the fast running time of insertion sort when its input is “nearly” sorted. Upon calling quicksort on a subarray with fewer than k elements, let it simply return without sorting the subarray. After the top-level call to quicksort returns, run insertion sort on the entire array to finish the sorting process. Argue that this sorting algorithm runs in O(nk+nlg(n/k)) expected time. How should we pick k, both in theory and in practice?

Was it helpful?

Solution

If you eval equation nlgn > nk + nlog(n/k) you get log k > k. Which is impossible. Hence in theory it's not possible. But in practice there are constant factors involved with insertion sort and quicksort. Take a look at the solution discussed in this pdf. You might want to update your answer. .

OTHER TIPS

Actually, the answer is k = 8.

The algorithm you get is the composition of two anonymous functions, one of which is O(nk) and the other which is O(n lg n/k). Those anonymous functions hide average case constants. Insertion sort runs in n^2/4 time in the average case and randomized quicksort runs in 1.386 n lg n in the average case. We want to find a value of k which minimizes the value of nk/4 + 1.386( n lg n/k ) which equals nk/4 + 1.386 n lg n - 1.386 n lg k. This means finding the maximum of the function 1.386 lg k - k/4. That maximum value occurs at k = 8.

A leaf has an equal probability to be of size between 1 to k.
So the expected size of a leaf is k/2.
If the expected size of a leaf is k/2 then we expect n/(k/2)=(2n)/k such leaves.
For simplicity lets say that we expect n/k such leaves and that the expected size of each leaf is k.
The expected running time of INSERTION-SORT is O(n^2).
We found that in exercise 5.2-5 and problem 2-4c.
So the expected running time of INSERTION-SORT usage is O((n/k)*(k^2))=O(nk).
If we expect our partition groups to be of size k then the height of the recursion tree is expected to be lgn-lgk=lg(n/k) since we expect to stop lgk earlier.
There are O(n) operations on each level of the recursion tree.
That leads us to O(nlg(n/k)).
We conclude that the expected running time is O(nk+nlg(n/k)).

In theory, we should pick k=lgn, since this way we get the best expected running time of O(nlgn).

In practice, we should pick k to one of the values around lgn that gives us better performance than just running RANDOMIZED-QUICKSORT.

The second part of the answer uses big-O notation quite loosely, so for a more precise picking of k, please follow the link given in the second answer by Ankush.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top