Chapter 1
Exercise 1.1
We start by proving that is asymptotically equivalent to . For sufficiently large , the term dominates the constant hidden in . Therefore, implying . Notice that we can ignore the base of a logarithm in the notation.
Exercise 1.2
The book already mentions that there are various strategies to avoid sentinels. For example, on the Algorithms, 4th Edition by RS, KW book’s website, you can find an interesting approach (see the section Creative Problems), that uses a single auxiliary array aux[]. As stated there: "The merge procedure copies the second half of a[] to aux[] in decreasing order and then does the merge back to a[]." This improves performance by eschewing boundary checks on indices. Nonetheless, such a mergesort isn’t stable anymore, which is a key property of this sorting algorithm.
On the same website, you have Merge.java; a standard implementation of the top-down mergesort algorithm without relying on sentinels. As a bonus, it also offers index-based sorting, which avoids moving elements around. Shuffling keys could be expensive if they’re associated with satellite data.
Exercise 1.3
GeeksforGeeks covers 3-way mergesort in great detail.
Below is the driver code to measure execution time. This should replace the original driver code in programs from GeeksforGeeks implementing standard and 3-way mergesort algorithms.
Here are the outputs created on a MacBook Pro 2019 machine:
--- standard mergesort ---
Ran 10 successful trials. Average: 2.260s, Min: 2.152s, Max: 2.426s
--- 3-way mergesort ---
Ran 10 successful trials. Average: 2.149s, Min: 2.031s, Max: 2.456s3-way mergesort has a lower recursion depth, which may potentially improve performance. It’s a better choice in external sorting scenarios, since it makes fewer passes over datasets typically stored on slow external mediums.
🌟 Exercise 1.4
This exercise introduces a neat trick for indirectly solving recurrences. Often, we can develop a recurrence describing the difference between two consecutive terms, and employ it to solve or prove some property of the original formula.
Using the recurrence (1) from the book, we have
Since and we can simplify the previous equation and get
The quantity is a positive integer, so we can apply recursively the above expansion by leveraging a useful property of floor and ceiling functions that . We can carry out this process until the problem size drops to 1, recalling that . Thus, we get
Let’s express as a sum of differences, so we have
🌟 Exercise 1.5
This exercise showcases an advanced usage of mathematical induction and techniques to manipulate floors and ceilings.
As mentioned in the book, the theorem for a general follows from (1) by induction. Let’s use the equation as the inductive hypothesis. The base case trivially holds, since . For we have
If is even, then , so the last line simplifies to
If is odd, then and , so we get
We must figure out how to express and . We know that for some positive integer . Observe that. If then, otherwise . Therefore, we must differentiate these two sub-cases.
When we immediately get
If observe that . Thus, we have
All in all, we’ve proved that the inductive hypothesis is true for all .
Exercise 1.6
We use the following three-way merge procedure in Java instead of the original from Exercise 1.3. It uses sentinels, so counting the number of comparisons inside the loop is straightforward.
Each iteration of the main for loop performs 2 comparisons to decipher the minimum among the 3 candidates. The recurrence is
If we consider the case when is a power of 3, and use the same analysis as in the book for the standard mergesort, then we get .
🌟 Exercise 1.7
Algorithm science is a methodology for developing machine-independent falsifiable hypotheses. This exercise is about the doubling hypothesis. According to Exercise 1.1 we know that . We can estimate , independent of the machine, as follows:
The next Python script shows how well our estimated reflects the actual . We can run different experiments by altering the machine-dependent constants and .
from math import log
# Machine dependent constants in the simulation.
c = 3.5
d = 7.1
def t(n):
return c * n * log(n, 2) + d * n
def estimate_t_2n(t_n, n):
return 2 * t_n * (1 + 1 / log(n, 2))
for n in [10, 100, 1000, 10000]:
t_n = t(n)
est_t_2n = estimate_t_2n(t_n, n)
act_t_2n = t(2 * n)
print(f"n: {n:6},\tEstimated: {est_t_2n:10.2f},\tActual: {act_t_2n:10.2f},\tRel. Error: {abs(est_t_2n - act_t_2n) / act_t_2n:.2%}")The output shows that the relative error drops quickly as increases.
n: 10, Estimated: 487.28, Actual: 444.53, Rel. Error: 9.62%
n: 100, Estimated: 6984.43, Actual: 6770.70, Rel. Error: 3.16%
n: 1000, Estimated: 92385.37, Actual: 90960.49, Rel. Error: 1.57%
n: 10000, Estimated: 1152826.43, Actual: 1142139.87, Rel. Error: 0.94%Exercise 1.8
Let’s use the Python implementation of mergesort from GeeksforGeeks and replace the driver code with the following snippet:
# Driver code
import time
from math import log
from random import randint
if __name__ == "__main__":
for n in (1_000_000, 10_000_000):
arr = [randint(0, 100) for _ in range(n)]
start_time = time.time()
mergeSort(arr, 0, len(arr) - 1)
end_time = time.time()
t_n = end_time - start_time
if n == 1_000_000:
print(f"Actual 1x time: {t_n:.2f} seconds")
t10x_estimate = 10 * t_n * (1 + log(10, 2) / log(n, 2))
print(f"Estimated 10x time: {t10x_estimate:.2f} seconds")
else:
print(f"Actual 10x time: {t_n:.2f} seconds")
print(f"Relative Error: {abs(t_n - t10x_estimate) / t_n:.2%}")Here’s an output on MacBook Pro 2019:
Actual 1x time: 4.05 seconds
Estimated 10x time: 47.20 seconds
Actual 10x time: 45.84 seconds
Relative Error: 2.97%Apparently, the estimated 10x time closely matches the actual 10x running time.
Exercise 1.9
The program Sort2distinct.java implements this algorithm in Java. Actually, it runs the 3-way partitioning procedure.
The inline comment for the sort method says that the code assumes the input has at most 3 distinct values. Without any preprocessing, this should be 2 (see Exercise 1.10).
Exercise 1.10
Perform an initial linear scan to record one set of indices , and for those three distinct values. Determine the index of the median and exchange it with an element at the beginning of an array. Finally, run the 3-way partitioning procedure from Exercise 1.9.
As an additional practice, solve the LeetCode problem Sort Colors.
🌟 Exercise 1.11
Number of compares is a reliable estimator of runtime performance of comparison-based sorting algorithms. The Python script below tests the hypothesis that the running time of mergesort divided by the number of compares that it uses approaches a constant as the problem size increases. The merge procedure uses sentinels to be aligned with the version from the book.
The running time is shown in microseconds. The ratio approaches a constant, as the problem size increases; it’s always a power 2, so the values of are exact. Interestingly, a similar ratio occurred both on MacBook Pro 2019 and HP laptop running Windows 11.
n t(n) c(n) t(n)/c(n)
--------------------------------------------
2 7.87 2.00 3.93
4 10.97 8.00 1.37
8 14.07 24.00 0.59
16 29.33 64.00 0.46
32 60.08 160.00 0.38
64 126.84 384.00 0.33
128 268.22 896.00 0.30
256 565.05 2048.00 0.28
512 1359.70 4608.00 0.30
1024 2971.89 10240.00 0.29
2048 6124.02 22528.00 0.27
4096 11742.83 49152.00 0.24
8192 25410.18 106496.00 0.24
16384 53179.98 229376.00 0.23
32768 108177.90 491520.00 0.22
65536 229867.22 1048576.00 0.22
131072 469205.86 2228224.00 0.21
262144 982743.98 4718592.00 0.21
524288 2103214.74 9961472.00 0.21
1048576 4166778.09 20971520.00 0.20Exercise 1.12
Let denote the cumulative (total) number of compares used by quicksort on all permutations of elements. The average number of comparisons is , so the base cases are and . We can derive based upon the simplified recurrence of from the book for .
🌟 Exercise 1.13
The invariant that the subarrays left after partitioning a random permutation are themselves both random permutations is quintessential for analysis and performance. The proof below is adopted from Sedgewick’s PhD thesis.
Observe that Program 1.2 only moves the partitioning element (pivot) to its final position at the end. Leaving the pivot out of the partitioning process is crucial to ensure that the invariant remains true during the whole process.
Suppose that a permutation of is being partitioned with all such permutations equally likely. Choose a pivot , so that after partitioning the size of the left subarray becomes . Any rearrangement of these elements, out of the possible , in the original permutation is mapped to a unique permutation in the left subarray. In other words, we can establish a bijection between these two objects. Since all the original permutations are equally likely, the same is true for permutations in the left subarray. The analogous reasoning applies, independently, for the right subarray. This concludes the proof.
Moving the pivot during the partitioning process, by setting , introduces bias into the subarrays (unless the pivot already happens to be in its final position). The probability of having a permutation where the pivot remains in its initial position is quite low. In our case, the first element from the left larger than the pivot will trigger it to move; it tends to fall near the beginning of the left subarray. Therefore, not all positions are equally likely to contain .
Program 1.2 even becomes incorrect if j is set to hi+1; the line that exchanges a[i] with a[hi] makes no sense anymore, since the pivot, most probably, were already moved somewhere else.
🌟 Exercise 1.14
We follow the technique from the book to eliminate summation on the RHS. We have
🌟 Exercise 1.15
The proof is adopted from Sedgewick’s PhD thesis.
WLOG assume that the input array is a random permutation of the set to make the exposition leaner. Finding the average number of exchanges used during the first partitioning stage (before the pointers cross) is a harder task than deriving or . We have dependencies on both the pivot and arrangement of keys (values to sort). The base cases are and the recurrence is
Let’s analyze the first term of the sum by looking at the Program 1.2 from the book. Quicksort is parsimonious regarding exchanges; only those elements are swapped that must be swapped. If the pivot is , then only those keys among which are smaller than are exchanged into the right subarray. Furthermore, each element is exchanged at most once during the whole partitioning process. Let be the random variable denoting the number of exchanges when the pivot is . The probability mass function of is given by
The numerator expresses in how many ways we can select smaller elements from the left subarray and use them to form the right subarray. The denominator is just the total number of possible combinations to form the right subarray. As a side note, the right side of the equation follows from applying the identity and shows symmetry in the problem, i.e., it describes the case when is chosen as the pivot. At any rate, .
The average number of exchanges is the expectation of defined as
The recurrence for the average number of exchanges is
If we take into account the official errata, then Namely, the equation holds only if we assume that the average partitioning cost in is instead of .
Exercise 1.16
Let denote the number of subarrays of size encountered, on the average, when sorting a random array of size with quicksort. We have 3 cases:
If then we can’t have a subarray of size , so .
If then , since only the initial array is of size .
If then the initial call is on a subarray whose size is larger than , hence we don’t count it. We average the counts from the resulting subproblems. The recurrence is
Fast forwarding the exact same steps as in Exercise 1.14, we get
Observe that we can’t iterate further than . Nonetheless, we can calculate it using the main recurrence to get ; all base cases are zeros except . Therefore, we have
🌟 Exercise 1.17
Fast forwarding the exact same steps as in the proof of Theorem 1.3, we get
We can calculate using the main recurrence
The formula for the average number of compares in quicksort, when using insertion sort to handle small subarrays, is
If we take into account the official errata, then the average partitioning cost in is instead of . Thus, the formula changes to
Exercise 1.18
After ignoring small terms (those significantly less than ) as well as constant multiples of (they don’t influence the minimization task), we can arrive at the following approximation
We can use a suitable math tool, like WolframAlpha, to solve the minimization problem as well as plot the graph of . Issue plot m(m-1)/(6(m+2))-2ln(m) for 1<=m<60 to see its shape.

You can find the minimum by saying m(m-1)/(6(m+2))-2ln(m) minimize for m>1 . It’ll report a global minimum at .
Exercise 1.19
According to the graph from Exercise 1.18 the threshold is .
Exercise 1.20
Below is the Python script that reports how many keys in a file of random positive integers less than are likely to be equal to some other keys in the file.
import random
import math
from collections import Counter
from statistics import mean, pstdev
def count_colliding_keys(n, m):
# Generate n random positive integers in [1, m-1].
arr = [random.randrange(1, m) for _ in range(n)]
freq = Counter(arr)
# Count all occurrences of values that appear at least twice.
return sum(cnt for cnt in freq.values() if cnt >= 2)
def simulate(n=10_000, m=1_000_000, trials=1000, seed=None):
if seed is not None:
random.seed(seed)
results = [count_colliding_keys(n, m) for _ in range(trials)]
mu = mean(results)
sigma = pstdev(results)
ci_half = 1.96 * sigma / math.sqrt(trials)
analytic = n * (1 - (1 - 1/m) ** (n - 1))
print(f"n={n}, m={m}, trials={trials}")
print(f"simulated mean = {mu:.3f}, std = {sigma:.3f}")
print(f"95% CI for mean = [{mu-ci_half:.3f}, {mu+ci_half:.3f}]")
print(f"analytic estimate = {analytic:.3f}")
simulate(seed=1)The output also shows the 95% confidence interval for the mean:
n=10000, m=1000000, trials=1000
simulated mean = 99.542, std = 13.895
95% CI for mean = [98.681, 100.403]
analytic estimate = 99.492The analytic formula is the same as for the number of people with a shared birthday; replace the number of days with .
The program sets the seed of the random number generator. This allows repeatable results even for randomly generated numbers.
Exercise 1.21
Try changing the relative magnitudes of and in Exercise 1.20 to see how the number of duplicates increase as approaches , especially when . The random permutation model assumes that all elements are distinct and all possible permutations are equally likely to occur. Nevertheless, even in the presence of many duplicates, Program 1.2 retains good performance. As emphasized in Sedgewick’s paper Quicksort with Equal Keys., the evidence in favor of stopping the scanning pointers on keys equal to the partitioning element in quicksort is overwhelming. This is exactly how partitioning works in Program 1.2.
There are also other schemes to efficiently handle duplicates. The Algorithms, 4th Edition by RS, KW book’s website, has an extensive treatment of quicksort, including variants like fast 3-way and dual-pivot. There’s also background information about entropy optimal sorting, which states the following
Arrays with large numbers of duplicate sort keys arise frequently in applications. In such applications, there’s potential to reduce the time of the sort from linearithmic to linear.
Thus, this is one example where the random permutation model is inaccurate. Of course, the story can go the other way around when partitioning doesn’t account for duplicates. In this case, quicksort may show quadratic runtime.
Exercise 1.22
Essentially, the distribution of keys is "asymptotically" immaterial for mergesort, as it achieves runtime in all cases. The sorting numbers specify the worst-case number of comparisons made by mergesort (see also Exercise 1.5). The best case takes about half as many iterations as its worst case. Of course, this last statement doesn’t apply to Program 1.1 (see Exercise 1.23). Thus, something like Table 1.1 isn’t useful to get insight into stochastic fluctuations due to dependence on probabilistic models.
On the other hand, Table 1.1 could validate a model, implementation and/or analysis. It could help in revealing machine-dependent constants in a model (see Exercise 1.11) or pinpoint awkward implementation-related issues, perhaps bugs, if the empirical evidence goes against mathematical predictions. Finally, it may assure us that the model for a new variant of mergesort aligns well with empirical data.
Exercise 1.23
The running time of mergesort as implemented in Program 1.1 depends only on the number of elements in the array being sorted, not on their arrangement. Therefore, the standard deviation in the number of compares is zero, since the input size fully determines the cost.
Last updated