Comparison between timsort and quicksort

https://stackoverflow.com/questions/7770230

09-02-2021
|

Вопрос

Why is it that I mostly hear about Quicksort being the fastest overall sorting algorithm when Timsort (according to wikipedia) seems to perform much better? Google didn't seem to turn up any kind of comparison.

Решение

TimSort is highly optimization mergesort, it is stable and faster than old mergesort.

when comparing with quicksort, it has two advantages:

It is unbelievably fast for nearly sorted data sequence (including reverse sorted data);
The worst case is still O(N*LOG(N)).

To be honest, I don't think #1 is a advantage, but it did impress me.

Here are QuickSort's advantages

QuickSort is very very simple, even a highly tuned implementation, we can write down its pseduo codes within 20 lines;
QuickSort is fastest in most cases;
The memory consumption is LOG(N).

Currently, Java 7 SDK implements timsort and a new quicksort variant: i.e. Dual Pivot QuickSort.

If you need stable sort, try timsort, otherwise start with quicksort.

Другие советы

More or less, it has to do with the fact that Timsort is a hybrid sorting algorithm. This means that while the two underlying sorts it uses (Mergesort and Insertion sort) are both worse than Quicksort for many kinds of data, Timsort only uses them when it is advantageous to do so.

On a slightly deeper level, as Patrick87 states, quicksort is a worst-case O(n²) algorithm. Choosing a good pivot isn't hard, but guaranteeing an O(n log n) quicksort comes at the cost of generally slower sorting on average.

For more detail on Timsort, see this answer, and the linked blog post. It basically assumes that most data is already partially sorted, and constructs "runs" of sorted data that allow for efficient merges using mergesort.

Generally speaking quicksort is best algorithm for primitive array. This is due to memory locality and cache.

JDK7 uses TimSort for Object array. Object array only holds object reference. The object itself is stored in Heap. To compare object, we need to read object from heap. This is like reading from one part of the heap for one object, then randomly reading object from another part of heap. There will be a lot of cache miss. I guess for this reason memory locality is not important any more. This is may be why JDK only uses TimSort for Object array instead if primitive array.

This is only my guess.

Here are benchmark numbers from my machine (i7-6700 CPU, 3.4GHz, Ubuntu 16.04, gcc 5.4.0, parameters: SIZE=100000 and RUNS=3):

$ ./demo 
Running tests
stdlib qsort time:                 12246.33 us per iteration
##quick sort time:                  5822.00 us per iteration
merge sort time:                    8244.33 us per iteration
...    
##tim sort time:                    7695.33 us per iteration
in-place merge sort time:           6788.00 us per iteration    
sqrt sort time:                     7289.33 us per iteration    
...
grail sort dyn buffer sort time:    7856.67 us per iteration

The benchmark comes from Swenson's sort project in which he as implemented several sorting algorithms in C. Presumably, his implementations are good enough to be representative, but I haven't investigated them.

So you really can't tell. Benchmark numbers only stay relevant for at most two years and then you have to repeat them. Possibly, timsort beat qsort waaay back in 2011 when the question was asked, but the times have changed. Or qsort was always the fastest, but timsort beat it on non-random data. Or Swenson's code isn't so good and a better programmer would turn the tide in timsort's favor. Or perhaps I suck and didn't use the right CFLAGS when compiling the code. Or... You get the point.

Tim Sort is great if you need an order-preserving sort, or if you are sorting a complex array (comparing heap-based objects) rather than a primitive array. As mentioned by others, quicksort benefits significantly from the locality of data and processor caching for primitive arrays.

The fact that the worst case of quicksort is O(n^2) was raised. Fortunately, you can achieve O(n log n) time worst-case with quicksort. The quicksort worst-case occurs when the pivot point is either the smallest or largest value such as when the pivot is the first or last element of an already sorted array.

We can achieve O(n log n) worst-case quicksort by setting the pivot at the median value. Since finding the median value can be done in linear time O(n). Since O(n) + O(n log n) = O(n log n), that becomes the worst-case time complexity.

In practice, however, most implementations find that a random pivot is sufficient so do not search for the median value.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow