Question

I recently learnt how hard people have worked to make quicksort quicker. From choosing a pivot element randomly to switching to Insertion sort for smaller arrays and even dealing with equal keys with 3-way partitioning. I was curious about how things worked for randomly generated data and thought of profiling some python code. I am attaching the script(s) below. Problem is that the scripts end up taking the same amount of time! And when I use %prun, it looks like the number of times quicksort gets called is also quite similar. So, all the improvements we make are only useful when our data meets the worst case (very much sorted in the wrong direction?)

def hoare_partition(a, lo, hi):

    if lo >= hi or (lo + 1) == len(a) - 1:
        return None
    pivot = a[lo]
    left = lo + 1
    right = hi


    while left <= right and right < len(a):
        while left < len(a) and a[left] < pivot:
            left += 1
        while a[right] > pivot:
            right -= 1
        if left <= right and right < len(a):
            a[left], a[right] = a[right], a[left]
            left += 1
            right -= 1
    a[lo], a[right] = a[right], a[lo]
    return right

def hoare_quicksort(a, lo, hi):
    ''' this is a vanilla implementation of quick sort. this will call the partition method that uses first element as pivot '''

    if lo < hi:
        p = hoare_partition(a, lo, hi)
        if p:
            #print 'calling for ', lo, p - 1
            hoare_quicksort(a, lo, p - 1)  

            #print 'calling for ', p + 1, hi
            hoare_quicksort(a, p + 1, hi)

This was the vanilla implementation where we select the first element itself as pivot. Then, I changed to select the midpoint.

So, one line gets changed

mid = lo + (hi - lo)//2

a[lo], a[mid] = a[mid], a[lo]
pivot = a[lo]

And then I also do random pivot selection, like this:

pos = random.randint(lo, hi + 1)


a[lo], a[pos] = a[pos], a[lo]
pivot = a[lo]

Now, I call them using

%prun hoare_quicksort([random.randint(0, 10000) for i in xrange(1000)], 0, 999)
%prun mid_quicksort([random.randint(0, 10000) for i in xrange(1000)], 0, 999)
%prun random_quicksort([random.randint(0, 10000) for i in xrange(1000)], 0, 999)

All of them take almost the same amount of time (5.22, 5.27, 5.61 ms). When I call them using %prun and see the number of times quicksort gets called, I again get very similar numbers. So, well, what's wrong?

Was it helpful?

Solution 2

So, all the improvements we make are only useful when our data meets the worst case (very much sorted in the wrong direction?)

It doesn't have to be the worst case, but any sort of preexisting order in the data will do nasty things to the runtime. Preexisting order is very common, and we want a sort that takes advantage of that to run faster, not one that looks at it and barfs.

You've tested your quicksorts on random data. That's pretty much the best-case scenario for a quicksort. What if the data comes from the keys of a dict, and the hash used causes them to come out in mostly-sorted order?

>>> data = dict.fromkeys(random.sample(xrange(10000), 9000)).keys()
>>> timeit.timeit('rand_quicksort(data[:], 0, len(data)-1)', 'from __main__ impo
rt rand_quicksort, data', number=1)
0.06688880239187256
>>> timeit.timeit('hoare_quicksort(data[:], 0, len(data)-1)', 'from __main__ imp
ort hoare_quicksort, data', number=1)
  # about 1000 lines omitted
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 9, in hoare_quicksort
  File "<stdin>", line 4, in hoare_quicksort
RuntimeError: maximum recursion depth exceeded

Well, we get a stack overflow, and that's terrible. Even if we didn't, it'd take freaking forever.

(If you want to reproduce this result, note that you have a few bugs in your code. if p should be if p is not None, and random.randint(lo, hi + 1) should be random.randint(lo, hi) or random.randrange(lo, hi + 1). I had to fix those to get proper test results.)

OTHER TIPS

Your benchmark is broken.

  1. You're benchmarking 1000 iterations of random.randint, not your sorts.
  2. You're running each sort only once, so you're benchmarking thread and process switching delays in your OS.

Try to precreate source array and run each sort thouthands, even millions times.

randomize pivot selection does not make quicksort go faster: it's useful just to avoid that our algorithm executes the worst case. Let's say that we sort an already sorted vector, and we decide to pick pivot as the rightemost element of each sub-array: this contains the maximum of this subarray, so quicksort split the subarray into 2 part in the most unbalanced way. this can be prevented by randomizing. if we're sure to avoid worst-case, we can say that the algorithm take similar amount of time until each recursion level generates partition of approximatively constant balance, so we can proof that the recursion tree depth is constant

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top