Question

I have been reading articles describing how space complexity of quicksort can be reduced by using the tail recursive version but I am not able to understand how this is so. Following are the two versions :

QUICKSORT(A, p, r)
       q = PARTITION(A, p, r)
       QUICKSORT(A, p, q-1)
       QUICKSORT(A, q+1, r)


TAIL-RECURSIVE-QUICKSORT(A, p, r)
   while p < r
      q = PARTITION(A, p, r)
      TAIL-RECURSIVE-QUICKSORT(A, p, q-1)
      p = q+1

(Source - http://mypathtothe4.blogspot.com/2013/02/lesson-2-variations-on-quicksort-tail.html)

As far as I understand , both of these would cause recursive calls on both the left and right half of the array. In both the cases , only one half would processed at a time and therefore at any time only one recursive call would be using the stack space. I am unable to see how the tail recursive quicksort saves space.

The pseudo code above is taken from the article - http://mypathtothe4.blogspot.com/2013/02/lesson-2-variations-on-quicksort-tail.html The explanation provided in the article confuses me even more -

Quicksort partitions a given sub-array and proceeds to recurse twice; one on the left-sub-array and one on the right. Each of these recursive calls will require its own individual stream of stack space. This space is used to store the indexing variables for the array at some level of recursion. If we picture this occurring from beginning to end of execution, we can see that the stack space doubles at each layer.

So how does Tail-Recursive-Quicksort fix all of this?

Well, instead of recursing on two sub-arrays, we now only recurse on one. This eliminates the need for doubling stack space at every layer of execution. We get around this problem by using the while loop as an iterative control that performs the same task. Instead of needing the stack to save sets of variables for two recursive calls, we simply alter the same set of variables and use the single recursive call on new variables.

I don't see how the stack space doubles at every layer of execution in the case of a regular quicksort.

Note :- There is no mention of compiler optimization in the article.

Was it helpful?

Solution

A tail recursive function call allows the compiler to perform a special optimization which it normally can not with regular recursion. In a tail recursive function, the recursive call is the very last thing to be executed. In this case, instead of allocating a stack frame for each call, the compiler can rework the code to simply reuse the current stack frame, meaning a tail-recursive function will only use a single stack frame as opposed to hundreds or even thousands.

This optimization is possible because the compiler knows that once the tail recursive call is made, no previous copies of variables will be needed, because there is no more code to execute. If, for instance, a print statement followed a recursive call, the compiler would need to know the value of the variable to be printed after the recursive call returns, and thus the stack frame cannot be reused.

Here's the wiki page if you'd like more information on how this "space saving" and stack reuse actually works, along with examples: Tail Call

Edit: I didn't explain how this applies to quicksort, did I? Well, some terms are thrown around in that article which make everything all confusing (and some of it is just plain wrong). The first function given (QUICKSORT) makes a recursive call on the left, a recursive call on the right, and then exits. Notice that the recursive call on the right is the very last thing that happens in the function. If the compiler supports tail recursive optimization (explained above), only the left calls create new stack frames; all the right calls just reuse the current frame. This can save some stack frames, but can still suffer from the case where the partitioning creates a sequence of calls where tail recursion optimization doesn't matter. Plus, even though right-side calls use the same frame, the left-side calls called within the right-side calls still use the stack. In the worst case, the stack depth is N.

The second version described is not a tail recursive quicksort, but rather a quicksort where only the left sorting is done recursively, and the right sorting is done using the loop. In fact, this quicksort (as previously described by another user) cannot have the tail recursion optimization applied to it, because the recursive call is not the last thing to execute. How does this work? When implemented correctly, the the first call to quicksort is the same as a left-side call in the original algorithm. However, no right-side recursive calls are even called. How does this work? Well, the loop takes care of that: instead of sorting "left then right", it sorts the left with a call, then sorts the right by continually sorting only the lefts of the right. It's really ridiculous sounding, but it's basically just sorting so many lefts that the rights become single elements and don't need to be sorted. This effectively removes the right recursion, making the function less recursive (pseudo recursive, if you will). However, the real implementation does not choose just the left side each time; it chooses the smallest side. The idea is still the same; it basically only does a recursive call on one side instead of both. Picking the shorter side will ensure that the stack depth can never be larger than log2(N), which is the depth of a proper binary tree. This is because the shorter side is always going to be at most half the size of our current array section. The implementation given by the article does not ensure this however, because it can suffer from the same worst-case scenario of "left is the whole tree". This article actually gives a pretty good explanation of it if you're willing to do more reading: Efficient selection and partial sorting based on quicksort

OTHER TIPS

The advantage, the whole point of the "mixed recursive/iterative" version, i.e. version that processes one sub-range by recursion and another sub-range by iteration, is that by choosing which of the two sub-ranges to process recursively you can guarantee that the depth of recursion will never exceed log2 N, regardless of how bad the pivot choice is.

For the TAIL-RECURSIVE-QUICKSORT pseudocode provided in the question, where the recursive processing is performed first by a literal recursive call, that recursive call should be given the shorter sub-range. That by itself will make sure that the recursion depth will be limited by log2 N. So, in order to achieve the recursion depth guarantee the code absolutely have to compare the lengths of sub-ranges before deciding which sub-range to process by recursive call.

A proper implementation of that approach might look as follows (borrowing your pseudocode as a starting point)

HALF-RECURSIVE-QUICKSORT(A, p, r)
   while p < r
      q = PARTITION(A, p, r)
      if (q - p < r - q)
        HALF-RECURSIVE-QUICKSORT(A, p, q-1)
        p = q+1
      else
        HALF-RECURSIVE-QUICKSORT(A, q+1, r)
        r = q-1

The TAIL-RECURSIVE-QUICKSORT pseudocode you provided does not make any attempts to compare the lengths of the sub-ranges. In such case it provides no benefit whatsoever. And no, it is not really "tail recursive". QuickSort cannot possibly be reduced to a tail-recursive algorithm.

If you do a Google search on the terms "qsort loguy higuy", you will easily find numerous instances of another popular QuickSort implementation (C standard library style) based on the very same idea of using recursion for only one of the two sub-ranges. That implementation for 32 bit platforms uses explicit stack of maximum depth of ~32 specifically because it can guarantee the the recursion depth will never get higher than that. (Similarly, 64 bit platforms will only need stack depth of ~64.)

The QUICKSORT version that makes two literal recursive calls is significantly worse in that regard, since repetitive bad choice of pivot can make it to reach very high recursion depth, up to N in the worst case. With two recursive calls you cannot guarantee that recursion depth will be limited by log2 N. A smart compiler might be able to replace the trailing call to QUICKSORT with iteration, i.e. turn your QUICKSORT into your TAIL-RECURSIVE-QUICKSORT, but it won't be smart enough to perform the sub-range length comparison.

Advantage of using tail-recursion := so that the compiler optimize the code and convert it to a non-recursive code.

Advantage of non-recursive code over recursive one := the non-recursive code requires less memory to execute than a recursive one. This is because of idle stack frames that the recursion consumes.

Here comes the interesting part:- even though the compilers can theoriticaly perform that optimization, they in practice dont. even the widespread compilers like dot-net and java dont perform that optimization.

one problem that all code optimizations face is the sacrifice in debug-ability. the optimized code no longer corresponds to source code so stack traces and exception details cant be easily understood. high-performance code or scientific applications are one thing but for majority of consumer applications debugging is required even after release. hence optimizations are not done that vigorously.

please see:

  1. https://stackoverflow.com/q/340762/1043824
  2. Why doesn't .NET/C# optimize for tail-call recursion?
  3. https://stackoverflow.com/a/3682044/1043824

There seems to be some vocabulary confusion here.

The first version is tail-recursive, because the last statement is a recursive call:

QUICKSORT(A, p, r)
  q = PARTITION(A, p, r)
  QUICKSORT(A, p, q-1)
  QUICKSORT(A, q+1, r)

If you apply the tail-recursion optimization, which is to convert the recursion into a loop, you get the second one, which is no longer tail-recursive:

TAIL-RECURSIVE-QUICKSORT(A, p, r)
  while p < r
    q = PARTITION(A, p, r)
    TAIL-RECURSIVE-QUICKSORT(A, p, q-1)
    p = q+1

The benefit of this is that usually, you will need less stack memory. Why is that? To understand, imagine you want to sort an array with 31 items. In the highly unlikely case that all partitions are perfect, i.e. they split the array right in the middle, your recursion depth would be 4. Indeed, the first split would yield two partitions of 15 items, the second one two partitions of 7 items, the third one two of 3 items, and after the fourth everything is sorted.

But partitions are rarely this perfect. As a result, not all recursions go equally deep. In our example, you might have some that are only three levels deep, and some that are 7 or more (worst case is 30). By eliminating half of the recursions, you have a fair chance that your maximum recursion depth will be less.

As AndreyT pointed out, often the ranges are compared to make sure that the biggest partition is always processed iteratively, and the smallest one recursively. That guarantees the smallest possible recursion depth for a given input and pivot selection strategy.

But this is not always the case. Sometimes, people want to yield results as soon as possible, or want to find and sort only the first n elements. In these cases they always want to sort the first partition before the second one. Even in this situation, eliminating the tail recursion usually improves memory usage, and never makes it worse.

I don't know exactly if this is the correct place to ask this doubt or I should have posted a new question but I have a quite similar doubt.

void print(int n) {
  if (n < 0) return;
  cout << " " << n;
// The last executed statement is recursive call
  print(n-1);
  print(n-1);
}

Is this tail recursive ?

Tail recursion is the optimization done by modern compilers called tail call elimination. When the caller/parent function nothing have to do in the unwinding stages after the child calls have finished, last thing is the recursion call itself, then the modern compiler uses the goto and labels for the optimization.

example: Our version -> Prints n to 1

void fun(int n){
if(n==0)
return;
printf("%d\n",n);
fun(n-1)
}

after optimization->

void fun(int n){
start:
if(n==0)
return;
printf("%d\n",n);
n=n-1;
goto start;
}

Advantages of this optimization: 1.Uses very few stack-frames for tail-recursive calls 2.Consumes less memory 3.No need to save the procedure activation record, this reduces the function overhead. 3.No more segmentation fault is C/C++ and stack overflow in java.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top