quicksort and insertion sort hybrid expected running time
-
06-12-2019 - |
Question
I am self learning CLRS 3rd edition and here is one of the tougher questions that I have encountered along with its answer as a service to all.
7.4-5
We can improve the running time of quicksort in practice by taking advantage of the
fast running time of insertion sort when its input is “nearly” sorted. Upon calling
quicksort on a subarray with fewer than k
elements, let it simply return without
sorting the subarray. After the top-level call to quicksort returns, run insertion sort
on the entire array to finish the sorting process. Argue that this sorting algorithm
runs in O(nk+nlg(n/k))
expected time. How should we pick k
, both in theory
and in practice?
Solution
If you eval equation nlgn > nk + nlog(n/k)
you get log k > k
. Which is impossible. Hence in theory it's not possible. But in practice there are constant factors involved with insertion sort and quicksort. Take a look at the solution discussed in this pdf. You might want to update your answer. .
OTHER TIPS
Actually, the answer is k = 8
.
The algorithm you get is the composition of two anonymous functions, one of which is O(nk)
and the other which is O(n lg n/k)
. Those anonymous functions hide average case constants. Insertion sort runs in n^2/4
time in the average case and randomized quicksort runs in 1.386 n lg n
in the average case. We want to find a value of k
which minimizes the value of nk/4 + 1.386( n lg n/k )
which equals nk/4 + 1.386 n lg n - 1.386 n lg k
. This means finding the maximum of the function 1.386 lg k - k/4
. That maximum value occurs at k = 8
.
A leaf has an equal probability to be of size between 1
to k
.
So the expected size of a leaf is k/2
.
If the expected size of a leaf is k/2
then we expect n/(k/2)=(2n)/k
such leaves.
For simplicity lets say that we expect n/k
such leaves and that the expected size of each leaf is k
.
The expected running time of INSERTION-SORT
is O(n^2)
.
We found that in exercise 5.2-5 and problem 2-4c.
So the expected running time of INSERTION-SORT
usage is O((n/k)*(k^2))=O(nk)
.
If we expect our partition groups to be of size k
then the height of the recursion tree is expected to be lgn-lgk=lg(n/k)
since we expect to stop lgk
earlier.
There are O(n)
operations on each level of the recursion tree.
That leads us to O(nlg(n/k))
.
We conclude that the expected running time is O(nk+nlg(n/k))
.
In theory, we should pick k=lgn
, since this way we get the best expected running time of O(nlgn)
.
In practice, we should pick k
to one of the values around lgn
that gives us better performance than just running RANDOMIZED-QUICKSORT
.
The second part of the answer uses big-O notation quite loosely, so for a more precise picking of k
, please follow the link given in the second answer by Ankush.