How to understand the analysis of expected running time of randomized quick-sort in this paper?

https://cs.stackexchange.com/questions/80276

04-11-2019
|

Question

I'm learning the book named Data Structures & Algorithms in Python.

On Page 557-558, there is a proof of the expected running time of randomized qucick-sort. I have some problems confusing me some days. I will note them in bold form.

Here's orginal text:

Proposition 12.3: The expected running time of randomized quick-sort on a sequence $S$ of size $n$ is $O(n \log n)$.

Justification: We assume two elements of $S$ can be compared in $O(1)$ time. Consider a single recursive call of randomized quick-sort, and let $n$ denote the size of the input for this call. Say that this call is “good” if the pivot chosen is such that subsequences $L$ and $G$ have size at least $\frac{n}{4}$ and at most $\frac{3n}{4}$ each; otherwise, a call is “bad.”

Now, consider the implications of our choosing a pivot uniformly at random. Note that there are $\frac{n}{2}$ possible good choices for the pivot for any given call of size n of the randomized quick-sort algorithm. Thus, the probability that any call is good is $\frac{1}{2}$. Note further that a good call will at least partition a list of size $n$ into two lists of size $\frac{3n}{4}$ and $\frac{n}{4}$, and a bad call could be as bad as producing a single call of size $n−1$.

Now consider a recursion trace for randomized quick-sort. This trace deﬁnes a binary tree, $T$, such that each node in $T$ corresponds to a different recursive call on a subproblem of sorting a portion of the original list.

Say that a node $v$ in $T$ is in size group $i$ if the size of $v$’s subproblem is greater than $\left(\frac{3}{4}\right)^{i+1}n$ and at most $\left(\frac{3}{4}\right)^in$ Let us analyze the expected time spent working on all the subproblems for nodes in size group $i$. By the linearity of expectation (Proposition A.19), the expected time for working on all these subproblems is the sum of the expected times for each one. Some of these nodes correspond to good calls and some correspond to bad calls. But note that, since a good call occurs with probability $\frac{1}{2}$, the expected number of consecutive calls we have to make before getting a good call is $2$.

⓵ Moreover, notice that as soon as we have a good call for a node in size group $i$, its children will be in size groups higher than $i$.

⓶ Thus, for any element $x$ from the input list, the expected number of nodes in size group $i$ containing $x$ in their subproblems is $2$. In other words, the expected total size of all the subproblems in size group $i$ is $2n$.

Since the nonrecursive work we perform for any subproblem is proportional to its size, this implies that the total expected time spent processing subproblems for nodes in size group $i$ is $O(n)$.

The number of size groups is $\log_{\frac{4}{3}}n$, since repeatedly multiplying by $\frac{3}{4}$ is the same as repeatedly dividing by $\frac{4}{3}$. That is, the number of size groups is $O(\log n)$. Therefore, the total expected running time of randomized quick-sort is $O(n\log n)$. (See Figure 11.13.)

Here are my problems:

According this sentence:

⓵ Moreover, notice that as soon as we have a good call for a node in size group $i$, its children will be in size groups higher than $i$.

Which size group does node $s(a)$ belong to? And why?

⓶ Thus, for any element $x$ from the input list, the expected number of nodes in size group $i$ containing $x$ in their subproblems is $2$. In other words, the expected total size of all the subproblems in size group $i$ is $2n$.

I can't really understand the meaning of this sentence.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange