Probability that two elements are compared in randomized quicksort

https://cs.stackexchange.com/questions/123575

29-09-2020
|

Question

I am having an issue in a specific part of the randomized quick-sort analysis.

As per the randomized quick-sort algorithm the pivot is chosen from the given subset on which it is called from a random index, instead of just choosing a specific index each time.

Now suppose that we give an array of size say $n$ to our randomized quicksort algorithm.

Now I request to have a look at the proof of lemma-7.1 in the text given below. Now we have given an array to our algorithm which can be of any permutation of the elements, but in the paragraph just after the proof of $lemma-7.1$.

why is the author considering a sorted instance of our input array while doing the analysis?

Moreover if look at the text after equation $(7.2)$ where there have justified their logic of finding the probability that $z_i$ shall be compared with $z_j$ in our algorithm. Now in that they are considering the subset {$z_i$,...,$z_j$}. Isn't this case of comparison of $z_i$,$z_j$ getting too specific if we consider that specific subset only? I mean to say we are using randomized approach and the probability of comparison might be derived using a more broader look, such as a permutation of all possible cases or so.

That we are using a specific subset and that too sorted is not convincing as to how are we getting the correct probability for our algorithm...

     {z1,z2,...,zn} zi being the ith minimum element
            ^
            |
            ----------------------------------------------------
                                                                |                           
    --P(Zi is compared with Zj)                                 |
   |                                                            |
   |                                                            |
   |-----> We are considering                                   |
   |        Zij = {Zi,Zi+1,...,Zj} which is a subset of --------
   |
   |------ Aren't we considering a very specific case??

And the probability of $1/(j-i+1)$ -> total no. of elements in the subset is also fixed for specific $i$ and $j$

In considering the probability of comparison of $z_i$,$z_j$, the subset in which the two elements are there and which is to be partitioned can be anything(i.e composed of any possible element) and of any size (not just $j-i+1$)...

May be the randomization condition is actually taking everything in account but I am not getting it. Please can you explain me the logic they are using to find the said probability and also please convince me that we are correctly finding the probability of comparison.

For reference i am attaching the corresponding pages of INTRODUCTION TO ALGORITHMS 3RD ED-- CLRS

Solution

A very simple proof: I claim that if there are d integers with values between x and y, and there are n ≥ 2 elements in the array, then the probability that x and y are compared is 2 / (d + 2), independent of n.

Proof by induction: If n = 2 then clearly d = 0, so the claim is that x and y are compared with probability 2 / (0 + 2) = 1. This is also clearly correct, since x and y must be compared.

Now let n ≥ 3. For the first partitioning, we choose a pivot at random. Every array element is compared against the pivot, and no other comparisons are made. So if by coincidence we choose x or y as the pivot, x and y will be compared. The probability for that is 2 / n. If by coincidence we choose one of the d elements with values between x and y, then the partitioning will move x to one partition and y to the other, so they are never compared. If we choose one of the other n - d - 2 elements, then x and y end up in the same partition, and by induction they will be compared with probability 2 / (d + 2).

So the probability that x and y are compared is

2 / n + (n - d - 2) / n * 2 / (d + 2) = 

2 * (d + 2) / (n * (d + 2)) + 2 * (n - d - 2) / (n * (d + 2)) =

(d + 2 + n - d - 2) * 2 / (n * (d + 2)) =

2 * n / (n * (d + 2)) = 

2 / (d + 2) qed.

That's of course the same result as Yuval's, since |j - i| = d + 1. The randomising makes the analysis quite easy - if we said for example "if n > 5 then we pick 5 elements at random and pick the median of those 5 as the pivot", the analysis would be much more complex.

PS. The proof in the paper is much easier: As you partition the array, $x_i$ and $x_j$ remain in the same sub-partition until a pivot with i <= pivot <= j is used. If that pivot is i or j then $x_i$ and $x_j$ are compared, otherwise they are not compared. So the chance is 2 / (abs (j-i) + 1).

OTHER TIPS

The idea of the proof is to compute, for any two elements $x,y$ in the array, the probability that they are compared in the algorithm. This probability could potentially depend on the entire array. However, it turns out that you can compute it only given the order statistics of $x,y$, that is, their relative order in the sorted array. If you know that $x$ is the $i$th smallest element in the array and that $y$ is the $j$th smallest element in the array, then the probability that $x,y$ are compared is $\frac{2}{|j-i|+1}$.

This is not a special case – every element $x$ in the array is the $i$th smallest element, for some value of $i$. This is just the pertinent information that allows us to calculate the probability that $x$ and $y$ are compared.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange