Quicksort vs. insertion sort on linked list: performance

https://cs.stackexchange.com/questions/1354

16-10-2019
|

Question

I have written a program to sort Linked Lists and I noticed that my insertion sort works much better than my quicksort algorithm. Does anyone have any idea why this is? Insertion sort has a complexity of $\Theta(n^2)$ and quicksort $O(n\log n)$ so therefore quicksort should be faster. I tried for random input size and it shows me the contrary. Strange...

Here the code in Java:

public static LinkedList qSort(LinkedList list) {

    LinkedList x, y;
    Node currentNode;
    int size = list.getSize();

    //Create new lists x smaller equal and y greater
    x = new LinkedList();
    y = new LinkedList();

    if (size <= 1)
        return list;
    else {

        Node pivot = getPivot(list);
        // System.out.println("Pivot: " + pivot.value);     
        //We start from the head
        currentNode = list.head;

        for (int i = 0; i <= size - 1; i++) {
            //Check that the currentNode is not our pivot
            if (currentNode != pivot) {
                //Nodes with values smaller equal than the pivot goes in x
                if (currentNode.value <= pivot.value) {
                    {
                        x.addNode(currentNode.value);
                        // System.out.print("Elements in x:");
                        // x.printList();
                    }

                } 
                //Nodes with values greater than the pivot goes in y
                else if (currentNode.value > pivot.value) {
                    if (currentNode != pivot) {
                        y.addNode(currentNode.value);
                        // System.out.print("Elements in y:");
                        // y.printList();
                    }
                }
            }
            //Set the pointer to the next node
            currentNode = currentNode.next;
        }

        //Recursive calls and concatenation of the Lists and pivot
        return concatenateList(qSort(x), pivot, qSort(y));

    }
}

Solution

Pedro's answer covers what is going wrong with your first attempt. But I want to talk about a few things when it comes to analyzing algorithms that may be a bit subtle.

What is your algorithm?

Before you can start analyzing an algorithm, you need to say exactly what you're doing. "Quicksort", for example, can mean a number of different things. The basic pattern is:

Pick a pivot element
Partition the input around the pivot
Recursively sort the pieces
Combine the sub-problems

Even at this level, there is some ambiguity, since I haven't actually described how to pick the pivot and how to partition. In the modern era, which probably means about 20 years, people think that you:

Pick the pivot uniformly at random
Parition into "less than", "equal", and "greater than" pieces

In which case the number of comparisons is $O(n\log n)$ in expectation and close to information theoretic bounds.

But! Notice that you and some commenters like to pick the pivot other ways, so it is still important to say what you really mean. Picking the pivot deterministically means that there are cases (depending on your rule) that will be much much worse.

What are your primitives?

Even once you've decided what you really want your algorithm to do, you need to think about the "basic steps". For quicksort we need:

A way to pick the pivot
A way to compare elements
A way to partition
A way to combine

Notice that most analysis of sorting devolves onto counting comparisons. We'll talk about why this is next.

How much do your primitives cost?

Finally, in any detailed running time analysis, you need some model of computation in which you can count "operations". The most popular model for algorithms is the "RAM" which has arrays with constant-time reads/writes, the usual arithmetic operations, and integer comparisons. (But there are lots of others.)

For the usual analysis of quicksort, on an input of length $n$, you'll want:

Picking the pivot to be $O(n)$
Partitioning to be $O(n)$
Combining sub-arrays to be $O(n)$

You can do this with linked lists (as in your second try) or (more usually) arrays, but the actual implementations will be very different. The issue is that in your first try (according to other answers), you were spending $\Omega(n^2)$ on partitioning, which changes the complexity a lot.

Why sorting analyses usually just count comparisons.

The reason I wrote all of this down is that, even in your bad implementation, you weren't doing more comparisons. Most of the effort in analyzing sorting algorithms based on comparisons counts only comparisons for a few reasons:

It is very general (machine independent)
For most algorithms in that class, a reasonable implementation can "charge" every other operation to a comparison in a way that each comparison gets $O(1)$ operations

This second part is where your first try fell down. Hope this helps.

Even more

Everything so far was still theory. Practical sorting algorithms actually need to be tried on real machines. This is a whole area, and my understanding is that the current champ is not one of the standard algorithms, but something called "timsort", which you can find here http://svn.python.org/projects/python/trunk/Objects/listsort.txt

OTHER TIPS

The problem is that the analysis of Quicksort, as you have implemented it, assumes that accessing any element can be done at a constant cost, i.e. in $\mathcal O(1)$. This is not the case with linked lists.

Note that at the top of the while (right > left) { loop in your code, you cycle through all the elements until left and then repeatedly cycle through all the elements adjusting right. While in Quicksort this should just cost at most $\mathcal O(n)$, in your case it costs at most $\mathcal O(N^2)$, where $n$ is the current level of recursion and $N$ is the total number of elements.

As Louis pointed out in the comments below, there is a linked list analogue of Quicksort, but it needs to be implemented quite differently from what you did.

The bottom line: Linked lists are not arrays, and algorithms that will work well on arrays will not necessarily work well on linked lists. There are several good books on algorithms out there that will point you to good algorithms for each case.

Update

The code in the question has now been modified to reflect the implementation described by Louis below. There are still potential inefficiencies, however, in the selection of the pivot. In essence, any single operation inside the main loop which is not $\mathcal O(1)$ will mess-up the asymptotic behavior.

Oh, and creating new lists in every level of recursion is probably not extremely efficient either.

The simplest way to sort lists is mergesort: Split the list into halves (just zip down, one element left, one right), sort left and right recursively, then merge (take the smaller of the first elements of the left and right lists). Well suited to lists (just linear transversals, mostly pointer juggling), and $\Theta(n \log n)$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange