Using red black trees for sorting

https://stackoverflow.com/questions/3270543

17-09-2020
|

Question

The worst-case running time of insertion on a red-black tree is O(lg n) and if I perform a in-order walk on the tree, I essentially visit each node, so the total worst-case runtime to print the sorted collection would be O(n lg n)

I am curious, why are red-black trees not preferred for sorting over quick sort (whose average-case running time is O(n lg n).

I see that maybe because red-black trees do not sort in-place, but I am not sure, so maybe someone could help.

Solution

Knowing which sort algorithm performs better really depend on your data and situation.

If you are talking in general/practical terms,

Quicksort (the one where you select the pivot randomly/just pick one fixed, making worst case Omega(n^2)) might be better than Red-Black Trees because (not necessarily in order of importance)

Quicksort is in-place. The keeps your memory footprint low. Say this quicksort routine was part of a program which deals with a lot of data. If you kept using large amounts of memory, your OS could start swapping your process memory and trash your perf.
Quicksort memory accesses are localized. This plays well with the caching/swapping.
Quicksort can be easily parallelized (probably more relevant these days).
If you were to try and optimize binary tree sorting (using binary tree without balancing) by using an array instead, you will end up doing something like Quicksort!
Red-Black trees have memory overheads. You have to allocate nodes possibly multiple times, your memory requirements with trees is doubles/triple that using arrays.
After sorting, say you wanted the 1045th (say) element, you will need to maintain order statistics in your tree (extra memory cost because of this) and you will have O(logn) access time!
Red-black trees have overheads just to access the next element (pointer lookups)
Red-black trees do not play well with the cache and the pointer accesses could induce more swapping.
Rotation in red-black trees will increase the constant factor in the O(nlogn).
Perhaps the most important reason (but not valid if you have lib etc available), Quicksort is very simple to understand and implement. Even a school kid can understand it!

I would say you try to measure both implementations and see what happens!

Also, Bob Sedgewick did a thesis on quicksort! Might be worth reading.

OTHER TIPS

There are plenty of sorting algorithms which are worst case O(n log n) - for example, merge sort. The reason quicksort is preferred is because it is faster in practice, even though algorithmically it may not be as good as some other algorithms.

Often in-built sorts use a combination of various methods depending on the values of n.

There are many cases where red-back trees are not bad for sorting. My testing showed, compared to natural merge sort, that red-black trees excel where:

Trees are better for Dups: All the tests where dups need to be eleminated, tree algorithm is better. This is not astonishing, since the tree can be kept very small from the beginning, whereby algorithms that are designed for inline array sort might pass around larger segments for a longer time.

Trees are better for Random: All the tests with random, tree algorithm is better. This is also not astonishing, since in a tree distance between elements is shorter and shifting is not necessary. So repeatedly inserting into a tree could need less effort than sorting an array.

So we get the impression that the natural merge sort only excels in ascending and descending special cases. Which cant be even said for quick sort.

Gist with the test cases here.

P.S.: it should be noted that using trees for sorting is non-trivial. One has not only to provide an insert routine but also a routine that can linearize the tree back to an array. We are currently using a get_last and a predecessor routine, which doesn't need a stack. But these routines are not O(1) since they contain loops.

Big-O time complexity measures do not usually take into account scalar factors, e.g., O(2n) and O(4n) are usually just reduced to O(n). Time complexity analysis is based on operational steps at an algorithmic level, not at a strict programming level, i.e., no source code or native machine instruction considerations.

Quicksort is generally faster than tree-based sorting since (1) the methods have the same algorithmic average time complexity, and (2) lookup and swapping operations require fewer program commands and data accesses when working with simple arrays than with red-black trees, even if the tree uses an underlying array-based implementation. Maintenance of the red-black tree constraints requires additional operational steps, data field value storage/access (node colors), etc than the simple array partition-exchange steps of a quicksort.

The net result is that red-black trees have higher scalar coefficients than quicksort does that are being obscured by the standard O(n log n) average time complexity analysis result.

Some other practical considerations related to machine architectures are briefly discussed in the Quicksort article on Wikipedia

Generally, representations of O(nlgn) algorithms can be expanded to A*nlgn + B where A and B are constants. There are many algorithmic proofs that show the coefficients for quicksort are smaller than those of other algorithms. That is in best-case (quick sort performs horribly on sorted data).

Hi the best way to explain the difference between all sorting routine in my opinion is. (My answer is for people who are confused how quick sort is faster in practice than another sorting algo).

"Think u are running on a very slow computer".

First thing one comparing operation takes 1 hour.
One shifting operation takes 2 hours.

"I am using hour just to make people understand how important time is".

Now from all the sorting operations quick-sort have very very less comparisons and very less swapping for elements.

Quick-sort is faster for this main reason.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow