> For the complete documentation index, see [llms.txt](https://evarga.gitbook.io/sh-intro-to-algs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://evarga.gitbook.io/sh-intro-to-algs/part-ii-sorting-and-order-statistics/7.-quicksort.md). # 7. Quicksort ## Exercises ### 7.1-1 Coloring is similar as in the book, whilst indexing is the same. 13 19 9 5 12 8 7 4 21 2 6 11\ 13 19 9 5 12 8 7 4 21 2 6 11\ 13 19 9 5 12 8 7 4 21 2 6 11\ 9 19 13 5 12 8 7 4 21 2 6 11\ 9 5 13 19 12 8 7 4 21 2 6 11\ 9 5 13 19 12 8 7 4 21 2 6 11\ 9 5 8 19 12 13 7 4 21 2 6 11\ 9 5 8 7 12 13 19 4 21 2 6 11\ 9 5 8 7 4 13 19 12 21 2 6 11\ 9 5 8 7 4 13 19 12 21 2 6 11\ 9 5 8 7 4 2 19 12 21 13 6 11\ 9 5 8 7 4 2 6 12 21 13 19 11\ 9 5 8 7 4 2 6 11 21 13 19 12 ### 7.1-2 **Available in the latest revision of the IM.** ### 7.1-3 **Available in the latest revision of the IM.** ### 7.1-4 **Available in the latest revision of the IM.** ### 7.2-1 **Available in the latest revision of the IM.** ### 7.2-2 **Available in the latest revision of the IM.** {% hint style="info" %} If we use a partitioning scheme introduced in [Exercise 7.1-2](#id-7.1-2), then the running time is $$\Theta(n \lg n)$$. {% endhint %} ### 7.2-3 **Available in the latest revision of the IM.** ### 7.2-4 **Available in the latest revision of the IM.** {% hint style="warning" %} Nearly sorted input will push QuickSort toward the worst-case scenario, rather than the best-case, as hinted in the IM. {% endhint %} ### 7.2-5 **Available in the latest revision of the IM.** ### 7.2-6 **Available in the latest revision of the IM.** ### 7.3-1 **Available in the latest revision of the IM.** ### 7.3-2 **Available in the latest revision of the IM.** ### 7.4-1 We guess that $$T(n) \ge cn^2$$ for some constant $$c >0$$. Substituting this guess into recurrence (7.1) yields $$ \begin{align\*} T(n)&\ge\max{cq^2+c(n-1-q)^2:0\le q\le n-1}+\Theta(n) \\ &=c\max{q^2+(n-1-q)^2:0\le q\le n-1}+\Theta(n) \\ &= c(n-1)^2+\Theta(n) && \text{(when $q=0 \lor q=n-1$)} \\ &=cn^2-c(2n-1)+\Theta(n) \\ &\ge cn^2 \end{align\*} $$ by picking the constant $$c$$ small enough that the $$\Theta(n)$$ term dominates. Therefore, $$T(n)=\Omega(n^2)$$. ### 7.4-2 **Available in the latest revision of the IM.** ### 7.4-3 $$ q^2+(n-1-q)^2=(n-1)^2+2q(q-(n-1)). $$ $$0 \le q \le n-1$$ implies that $$2q(q-(n-1)) \le 0$$. When this expression is zero, we have a maximum. It is a quadratic polynomial having at most two distinct roots, so $$q= 0 \lor q=n-1$$. ### 7.4-4 $$ \begin{align\*} \mathbb{E}{\[X]} &= \sum\_{i=1}^{n-1} \sum\_{j=i+1}^n \frac{2}{j-i+1} \\ &= \sum\_{i=1}^{n-1} \sum\_{k=1}^{n-i} \frac{2}{k+1} \\ &= 2\sum\_{i=1}^{n-1} \left(\sum\_{k=1}^{n-i+1} \frac{1}{k}-1\right) \\ &\ge 2\sum\_{i=1}^{n-1} \left(\ln {(n-i+2)} -1\right) && \text{(by equation (A.20))} \\ &= 2\left(\sum\_{i=3}^{n+1} \ln i - (n-1)\right) \\ &= 2 \left(\ln \prod\_{i=3}^{n+1} i-(n-1)\right) \\ &\ge 2\ln \prod\_{i=1}^n i-2n && \text{(for sufficiently large $n$)} \\ &= 2\ln n! -2n \\ &= \Theta(n\lg n) && \text{(by equation (3.28))}. \end{align\*} $$ This bound and Lemma 7.1 allow us to conclude that the expected running time of `Randomized-Quicksort` is $$\Omega(n\lg n)$$. ### 7.4-5 Insertion sort requires $$O(k^2)$$ time to sort an array of size $$k$$. The total time spent by insertion sort is $$O(n/k) \cdot O(k^2)=O(nk)$$, since we can have at most $$n/k$$ subarrays of size $$k$$. The expected number of levels in the recursion tree is $$O(\lg (n/k))$$. Therefore, the total time spent in partitioning subarrays is $$O(n\lg(n/k))$$. After summing up these terms, we get that the randomized version of this sorting algorithm runs in $$O(nk+n\lg (n/k))$$ expected time. The modified algorithm has the same asymptotic running time as standard randomized quick-sort when $$O(nk+n\lg (n/k))=O(n\lg n)$$. The largest asymptotic value of $$k$$, as a function of $$n$$, that satisfies this condition is $$k=\Theta(\lg n)$$. Notice that $$k$$ cannot be asymptotically larger than $$\Theta(\lg n)$$, otherwise $$nk=\omega(n\lg n)$$ and we would make things worse. On the other hand, plugging in $$k=\lg n$$ into our formula for the expected running time gives $$ O(n\lg n+n\lg (n/\lg n))=O(2n\lg n-n\lg \lg n)=O(n\lg n). $$ In practice, you would need to experimentally find the best $$k$$, as performance depends on many other machine related factors. {% hint style="info" %} There is a fun practical trick, where you don't sort the small subarrays during the recursion at all. Instead, just let the quicksort recursion return doing absolutely nothing when size $$< k$$. You end up with one giant, nearly-sorted array where no element is more than $$k$$ positions away from its final sorted spot (see also [Exercise 7.2-4](#id-7.2-4)). Then, you run one single, massive insertion sort over the entire array at the very end. Because insertion sort runs in $$O(n + \text{inversions})$$ time, and the array is almost perfectly sorted, this single pass runs in exactly $$O(nk)$$ time, but it completely eliminates the overhead of calling the insertion sort function $$n/k$$ separate times! {% endhint %} ### ★ 7.4-6 If the median falls into the lowest or largest $$\alpha n$$ numbers, then we get a worse than $$\alpha$$-to-$$(1-\alpha)$$ split. This happens if at least two randomly selected numbers belong in these ranges. We have the following 4 disjoint events: 1. All three numbers are among the lowest $$\alpha n$$ values. 2. All three numbers are among the largest $$\alpha n$$ values. 3. Exactly two numbers are among the lowest $$\alpha n$$ values. 4. Exactly two numbers are among the largest $$\alpha n$$ values. Therefore, the total probability is $$ p=\alpha^3+\alpha^3+\binom{3}{2}\alpha^2(1-\alpha)+\binom{3}{2}\alpha^2(1-\alpha)=2\alpha^2(3-2\alpha). $$ ## Problems ### 7-1 Hoare partition correctness #### a. Initialization:\ A=(13, 19, 9, 5, 12, 8, 7, 4, 11, 2, 6, 21)\ p=1\ r=12\ x=13\ i=0\ j=13 After Iteration 1:\ A=(6, 19, 9, 5, 12, 8, 7, 4, 11, 2, 13, 21)\ i=1\ j=11 After Iteration 2:\ A=(6, 2, 9, 5, 12, 8, 7, 4, 11, 19, 13, 21)\ i=2\ j=10 After Iteration 3:\ A=(6, 2, 9, 5, 12, 8, 7, 4, 11, 19, 13, 21)\ i=10\ j=9\ RETURN 9 #### b. `Hoare-Partition` produces a balanced split instead of hitting the worst-case scenario, as is the case with `Partition`. It better distributes repeated elements equal to pivot, so it works well even in presence of duplicates. {% hint style="danger" %} Nonetheless, `Hoare-Partition` is also terribly flawed by moving the pivot during the partitioning stage. This phenomenon is described [here](https://evarga.gitbook.io/sh-aofa/introduction/chapter-1-analysis-of-algorithms#exercise-1.13). {% endhint %} #### c. We focus on the first iteration and show what happens to indices $$i$$ and $$j$$ as well as what transformation of $$A$$ ensures that none of these indices ever fall out of range. Initially, before entering the `while` loop, we have $$x=A\[p]$$, $$i=p-1$$ and $$j=r+1$$. The body of the loop runs as follows: 1. Index $$j$$ is decremented before reading $$A$$, so $$p \le j \le r$$ on first access. 2. It will be decremented until stumbling across an element not greater than $$x$$. Notice that $$j$$ cannot pass $$p$$, since $$x=A\[p]$$, hence we know that $$p\le j$$ after line 7. 3. Index $$i$$ is incremented before reading $$A$$, so $$p \le i\le r$$ on first access. 4. It will be incremented until stumbling across an element not smaller than $$x$$. Notice that $$i$$ cannot pass $$p$$, since $$x=A\[p]$$, hence we know that $$i=p$$ after line 10. At this point, we have two possibilities at line 11: 1. If $$i=j$$, then the function returns $$p$$. This occurs when all other elements are greater than $$x$$. 2. Otherwise, we must have $$i\ j$$ and the function would return $$j$$. #### d. Again, we focus our attention on the first iteration. From part (c) we know that $$p \le j$$ at termination. Before accessing $$A$$ index $$j$$ is decremented, so $$j \le r$$. Now, based on the fact that $$p\i$$. We know that every element of $$A\[i:i'-1]$$ is less than or equal to $$x$$. So, $$A\[p:i] \cup A\[i:i'-1] =A\[p:i'-1]$$ has only elements less than or equal to $$x$$. Let $$i=i'$$ and $$j=j'$$. After line 10, but before entering line 11, we have that the subarray $$A\[p:i-1]$$ contains elements less than or equal to the pivot, and the subarray $$A\[j+1:r]$$ contains elements greater than or equal to the pivot. Furthermore, $$A\[i] \ge x$$ and $$A\[j] \le x$$. If $$i\j \implies i-1 \ge j$$. Because $$i-1 \ge j$$, the segment $$A\[p : j]$$ is a subset of $$A\[p : i-1]$$. Since we already know that all elements in $$A\[p : i-1]$$ are $$\le x$$, it is guaranteed that all elements in $$A\[p : j]$$ are $$\le x$$. We also already established that $$A\[j+1 : r] \ge x$$. In both cases, It is true the every element from $$A\[p:j]$$ is less than or equal to every element from $$A\[j+1:r]$$. #### f. ``` Quicksort(A, p, r) if p < r q = Hoare-Partition(A, p, r) Quicksort(A, p, q) Quicksort(A, q + 1, r) ``` ### 7-2 Quicksort with equal element values **Available in the latest revision of the IM.** {% hint style="warning" %} There is an omission in the IM in part (c). `Randomized-Partition'` also differs at line 2, where $$A\[i]$$ must be exchanged with $$A\[p]$$ instead of $$A\[r]$$. {% endhint %} {% hint style="warning" %} In part (d) the IM does not cover the best-case scenario. Namely, when all element values are equal, then `Quicksort'` runs in linear time. {% endhint %} ### 7-3 Alternative quicksort analysis **Available in the latest revision of the IM.** ### 7-4 Stooge sort **Available in the latest revision of the IM.** ### 7-5 Stack depth for quicksort **Available in the latest revision of the IM.** ### 7-6 Median-of-3 partition #### a. The three randomly selected elements must include $$z\_i$$ itself, one smaller than $$z\_i$$, and one larger than $$z\_i$$. These elements can be chosen in any order, so we should count the number of 3-sets satisfying these constraints among $$\binom{n}{3}$$ such sets. We have $$p\_i=\cfrac{6(i-1)(n-i)}{n(n-1)(n-2)}$$. #### b. We need to find $$\cfrac{p\_{\lfloor(n+1)/2 \rfloor}}{1/n}$$ at the limit, when $$n \to \infin$$. Observe that $$i=\lfloor(n+1)/2 \rfloor$$ tends to $$n/2$$ as $$n$$ increases. Therefore, $$ \lim\_{n\to. \infin} \cfrac{\cfrac{6(n/2-1)(n-n/2)}{n(n-1)(n-2)}}{1/n}=\lim\_{n\to. \infin} \cfrac{\cfrac{3}{2}(n^2-2n)}{n^2-3n+2}=\cfrac{3}{2}. $$ #### c. We need to find $$\lim\_{n \to \infin}\cfrac{\sum\_{i=n/3}^{2n/3} p\_i}{1/3}$$. Following the hint from the book, we approximate the sum by an integral, thus $$ \begin{align\*} \sum\_{i=n/3}^{2n/3} p\_i &\approx \int\_{n/3}^{2n/3} \frac{6(-x^2+nx+x-n)}{n(n-1)(n-2)}dx \\ &= \frac{6(-7n^3/81+3n^3/18+\Theta(n^2))}{n(n-1)(n-2)} \\ &\to \frac{13}{27} && \text{(when $n \to \infin$)}. \end{align\*} $$ Thus, the median-of-3 method increases the likelihood of getting a good split (compared with the ordinary implementation) by 13/9. #### d. According to [Exercise 7.4-2](#id-7.4-2), quicksort's best case running time is $$\Omega(n \lg n)$$. Thus, any technique that improves the chances of having good splits cannot influence its asymptotic lower bound. ### 7-7 Fuzzy sorting of intervals #### a. The `Fuzzy-Partition` function is similar to `Partition'` from [Problem 7-2](#id-7-2-quicksort-with-equal-element-values), part (b). The rest of the code of randomized quicksort uses this partitioning function instead of the ordinary version; also, the pivot is the first element in the subarray. ``` Fuzzy-Partition(A, p, r) x = A[p] // Pivot is chosen to be at the beginning of A[p:r]. i = p h = p for j = p + 1 to r if A[j].b < x.a y = A[j] A[j] = A[h + 1] A[h + 1] = A[i] A[i] = y i = i + 1 h = h + 1 elseif A[j].a <= x.b // Mutually overlapping intervals including the pivot are treated as equal. // There could be many such joint sets of intervals, we select one of them. x.a = Max(A[j].a, x.a) x.b = Min(A[j].b, x.b) exchange A[h + 1] with A[j] h = h + 1 return i, h // A[i:h] elements fuzzy-equal to x. ``` #### b. Observe that as the intervals overlap more and more, the "equal" sector will grow. These intervals will be omitted from subsequent recursive calls. When all intervals overlap, then after calling `Fuzzy-Partition` once from the top level, the whole array becomes sorted. Therefore, the expected running time reduces to $$\Theta(n)$$. Otherwise, our algorithm runs in usual $$\Theta(n \lg n)$$ expected time. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://evarga.gitbook.io/sh-intro-to-algs/part-ii-sorting-and-order-statistics/7.-quicksort.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.