Chapter 1
Exercise 1.1
We start by proving that f(N) is asymptotically equivalent to NlgN. For sufficiently large N, the lgN term dominates the constant hidden in O(N). Therefore, limN→∞∣f(N)/(NlgN)∣=∣1+O(N)/(NlgN)∣=1 implying f(N)=Θ(NlogN).
Exercise 1.2
The book already mentions that there are various strategies to avoid sentinels. For example, on the Algorithms, 4th Edition by RS, KW book’s website, you can find an interesting approach (see the section Creative Problems), that uses a single auxiliary array aux[]. As stated there: "The merge procedure copies the second half of a[] to aux[] in decreasing order and then does the merge back to a[]." This improves performance by eschewing boundary checks on indices. Nonetheless, such a mergesort isn’t stable anymore, which is a key property of this sorting algorithm.
On the same website, you have Merge.java; a standard implementation of the top-down mergesort algorithm without relying on sentinels. As a bonus, it also offers index-based sorting, which avoids moving elements around. Shuffling keys could be expensive if they’re associated with satellite data.
Exercise 1.3
GeeksforGeeks covers 3-way mergesort in great detail.
Below is the driver code to measure execution time. This should replace the original driver code in programs from GeeksforGeeks implementing standard and 3-way mergesort algorithms.
Here are the outputs created on a MacBook Pro 2019 machine:
3-way mergesort has a lower recursion depth, which may potentially improve performance. It’s a better choice in external sorting scenarios, since it makes fewer passes over datasets typically stored on slow external mediums.
Exercise 1.4
Using the recurrence (1) from the book, we have
Since ⌊(N+1)/2⌋=⌈N/2⌉ and ⌈(N+1)/2⌉=⌊N/2⌋+1 we can simplify the previous equation and get
The quantity ⌊N/2⌋ is a positive integer, so we can apply recursively the above expansion by leveraging a useful property of floor and ceiling functions that ⌊⌊x⌋/2⌋=⌊x/2⌋. We can carry out this process until the problem size drops to 1, recalling that C1=0. Thus, we get
Let’s express CN as a sum of differences, so we have
Exercise 1.5 🌟
This exercise showcases an advanced usage of mathematical induction and techniques to manipulate floors and ceilings. Furthermore, it introduces a technique of describing quantities in more detail with additional parameters. For example, a natural number can be interpreted as a binary integer and denoted by ∑k2k. For an advanced example of this approach, look at the solution of a complex identity.
As mentioned in the book, the theorem for a general N follows from (1) by induction. Let’s use the equation CN=N⌈lgN⌉+N−2⌈lgN⌉ as the inductive hypothesis. The base case C1=0 trivially holds, since lg1=0. For N>1 we have
If N is even, then ⌈N/2⌉=⌊N/2⌋=N/2, so the last line simplifies to
If N is odd, then ⌈N/2⌉=(N+1)/2 and ⌊N/2⌋=(N−1)/2, so we get
We must figure out how to express ⌈lg(N+1)⌉ and ⌈lg(N−1)⌉. We know that N∈(2k,2k+1) for some positive integer k. Observe that⌈lg(N+1)⌉=⌈lgN⌉=k+1. If N>2k+1 then⌈lg(N−1)⌉=⌈lgN⌉=k+1, otherwise ⌈lg(N−1)⌉=⌈lgN⌉−1=k. Therefore, we must differentiate these two sub-cases.
When N>2k+1 we immediately get
If N=2k+1 observe that (N−1)/2=(2k+1−1)/2=2k−1=2⌈lgN⌉−2. Thus, we have
All in all, we’ve proved that the inductive hypothesis is true for all N≥1.
Exercise 1.6
We use the following three-way merge procedure in Java instead of the original from Exercise 1.3. It uses sentinels, so counting the number of comparisons inside the loop is straightforward.
Each iteration of the main for loop performs 2 comparisons to decipher the minimum among the 3 candidates. The recurrence is
If we consider the case when N is a power of 3, and use the same analysis as in the book for the standard mergesort, then we get CN=2Nlog3N.
Exercise 1.7 🌟
Algorithm science is a methodology for developing machine-independent falsifiable hypotheses. This exercise is about the doubling hypothesis.
According to Exercise 1.1, we know that tN=cNlgN+dN∼cNlgN. We can estimate t2N, independent of the machine, as follows:
The next Python script shows how well our estimated t2N reflects the actual t2N. We can run different experiments by altering the "machine-dependent" constants c and d.
The output shows that the relative error drops quickly as N increases.
Exercise 1.8
Let’s use the Python implementation of mergesort from GeeksforGeeks and replace the driver code with the following snippet:
Here’s an output on a MacBook Pro 2019:
Exercise 1.9
The program Sort2distinct.java implements this algorithm in Java.
The inline comment of the sort method says that it assumes the input has at most 3 distinct values. Without any preprocessing, this should be 2 (see Exercise 1.10).
Exercise 1.10
Perform an initial linear scan to record one set of indices i, j and k for those three distinct values. Determine the index of the median and exchange it with an element at the beginning of an array. Finally, run the 3-way partitioning procedure from Exercise 1.9.
As an additional practice, solve the LeetCode problem Sort Colors.
Exercise 1.11 🌟
The number of compares is a reliable estimator of runtime performance of comparison-based sorting algorithms. We can run experiments on different computers to test this hypothesis; the running time of mergesort divided by the number of compares that it uses approaches a constant as the problem size increases.
The following Python program implements this concept. The merge procedure uses sentinels to be aligned with the version from the book.
The running time is shown in microseconds. The ratio tN/cN stabilizes, as the problem size increases, although there are some fluctuations. Here is the output generated on a MacBook Air 2025 computer:
On a MacBook Pro 2019 machine, the constant approached 0.3.
Exercise 1.12
Let TN denote the cumulative (total) number of compares used by quicksort on all N! permutations of N elements. The average number of comparisons is CN=TN/N!, so the base cases are T2=6 and T1=T0=0. We can derive TN based upon the simplified recurrence of CN from the book for N>2.
Exercise 1.13 🌟
The invariant that the subarrays left after partitioning a random permutation are themselves both random permutations is quintessential for analysis and performance. Observe that Program 1.2 only moves the partitioning element (pivot) v to its final position at the end. Leaving the pivot out of the partitioning process is crucial to ensure that the invariant remains true during the whole process.
The proof below is adopted from Sedgewick’s PhD thesis.
Suppose that a permutation of {1,2,…,N} is being partitioned with all such N! permutations equally likely. Choose a pivot 1≤v≤N, so that after partitioning the size of the left subarray becomes v−1. Any rearrangement of these v−1 elements, out of the possible (v−1)!, in the original permutation is mapped to a unique permutation in the left subarray. In other words, we can establish a bijection between these two objects. Since all the original permutations are equally likely, the same is true for permutations in the left subarray. The analogous reasoning applies, independently, for the right subarray. This concludes the proof.
Moving the pivot during the partitioning process, by setting j:=hi+1, introduces bias into the subarrays (unless the pivot already happens to be in its final position). The probability of having a permutation where the pivot remains in its initial position is quite low. In our case, the first element from the left larger than the pivot will trigger it to move; it tends to fall near the beginning of the left subarray. Therefore, not all positions are equally likely to contain v.
Program 1.2 even becomes incorrect if j is set to hi+1; the line that exchanges a[i] with a[hi] makes no sense anymore, since the pivot, most probably, had already moved somewhere else.
Exercise 1.14
We follow the technique from the book to eliminate summation on the RHS. We have
Exercise 1.15 🌟
The derivation augments the technique from the book by introducing an extra dimension. Finding the average number of exchanges used during the first partitioning stage (before the pointers cross) is a harder task than deriving CN or AN. We have dependencies on both the pivot and arrangement of keys (values to sort).
The proof is adopted from Sedgewick’s PhD thesis.
WLOG assume that the input array is a random permutation of the set {1,2,…,N}. The base cases are B1=B0=0 and the recurrence is
Let’s analyze the first term of the sum by looking at Program 1.2 from the book. Quicksort is parsimonious regarding exchanges; only those elements are swapped that must be swapped. If the pivot is v=a[N], then only those keys among a[v],…,a[N−1] which are smaller than v are exchanged into the right subarray. Furthermore, each element is exchanged at most once during the whole partitioning process. Let Xv be the random variable denoting the number of exchanges used during the first partitioning stage when the pivot is v. The probability mass function of Xv is given by
The numerator expresses in how many ways we can select k smaller elements from the left subarray and use them to form the right subarray. The denominator is just the total number of possible combinations to form the right subarray. As a side note, the right side of the equation follows from applying the identity (nn)=(n−kn) and shows symmetry in the problem, i.e., it describes the case when a[1] is chosen as the pivot. At any rate, k∈[0,v−1].
The average number of exchanges is the expectation of Xv
This gives
If we take into account the official errata, then BN=61CN−21AN. Namely, the equation holds only if we assume that the average partitioning cost in CN is N+1 instead of N+1−1/N.
Exercise 1.16 🌟
This exercise demonstrates the significance of algorithm analysis. Having an analytical expression for the average number of small subarrays encountered during sorting can facilitate performance optimization.
Let BNk denote the number of subarrays of size k>0 encountered, on the average, when sorting a random array of size N with quicksort. We have 3 cases:
If N<k then we can’t have a subarray of size k, so BNk=0.
If N=k then BNk=1, since only the initial array is of size k.
If N>k then the initial call is on a subarray whose size is larger than k, hence we don’t count it. We average the counts from the resulting subproblems. The recurrence is
Fast forwarding the exact same steps as in Exercise 1.14, we get
Observe that we can’t iterate further than B(k+1)k. Nonetheless, we can calculate it using the main recurrence to get B(k+1)k=2/(k+1); all base cases are zeros except Bkk=1. Therefore, we have
A simple variation of this problem is the review question Q1.3; the answer is
BNk=k+2k(N+1)(for N>k).
Exercise 1.17
Fast forwarding the exact same steps as in the proof of Theorem 1.3, we get
We can calculate CM+1 using the main recurrence
The formula for the average number of compares in quicksort, when using insertion sort to handle small subarrays, is
If we take into account the official errata, then the average partitioning cost in CN is N+1−1/N instead of N+1. Thus, the formula changes to
CN=(N+1)(1−(M+1)(M+2)1+6(M+2)M(M−1))+2(N+1)(HN+1−HM+2) (for N>M).
Exercise 1.18 🌟
Algorithm analysis uses mathematical models to find optimal parameter values, as shown in this example.
After ignoring small terms (those significantly less than N) as well as constant multiples of N (they don’t influence the minimization task), we can arrive at the following approximation
We can use a suitable math tool, like WolframAlpha, to solve the minimization problem as well as plot the graph of f(M). Issue plot m(m-1)/(6(m+2))-2ln(m) for 1<=m<60 to see its shape.

You can find the minimum by saying m(m-1)/(6(m+2))-2ln(m) minimize for m>1 . It reports a global minimum at M≈12.4.
You may use a different approximation, like using lg(M+2) instead of lgM inside f(M), but the outcome would be similar. Also, you may want to try the more accurate expression for CN (see the remark in the previous exercise).
Exercise 1.19
According to the graph from the previous exercise, the threshold is M=49.
Exercise 1.20
Below is the Python script that reports how many keys in a file of N=104 random positive integers less than M=106 are likely to be equal to some other keys in the file.
The output shows an excellent alignment between empirical evidence and analytic result:
The analytic formula is the same as for the number of people with a shared birthday; replace the number of days d with M.
The program sets the seed of the random number generator. This allows repeatable results even for randomly generated numbers.
Exercise 1.21
Try changing the relative magnitudes of N and M in the previous exercise to see how the number of duplicates increase as M drops, especially when M≪N. The random permutation model assumes that all elements are distinct and all possible N! permutations are equally likely to occur. Nevertheless, even in the presence of many duplicates, Program 1.2 retains good performance. As emphasized in Sedgewick’s paper Quicksort with Equal Keys., the evidence in favor of stopping the scanning pointers on keys equal to the partitioning element in quicksort is overwhelming. This is exactly how partitioning works in Program 1.2.
There are also other schemes to efficiently handle duplicates. The Algorithms, 4th Edition by RS, KW book’s website, has an extensive treatment of quicksort, including variants like fast 3-way and dual-pivot. There’s also background information about entropy optimal sorting, which states the following
Arrays with large numbers of duplicate sort keys arise frequently in applications. In such applications, there’s potential to reduce the time of the sort from linearithmic to linear.
Thus, this is one example where the random permutation model is inaccurate. Of course, the story can go the other way around when partitioning doesn’t account for duplicates. In this case, quicksort may show quadratic runtime.
Exercise 1.22
Essentially, the distribution of keys is "asymptotically" immaterial for mergesort, as it achieves Θ(NlgN) runtime in all cases. The sorting numbers specify the worst-case number of comparisons made by mergesort (see also Exercise 1.5). The best case takes about half as many iterations as its worst case; this doesn’t apply to Program 1.1 (see Exercise 1.23). All in all, something like Table 1.1 isn’t useful to get insight into stochastic fluctuations.
On the other hand, Table 1.1 could validate a model, implementation and/or analysis. It could help in revealing machine-dependent constants in a model (see Exercise 1.11) or pinpoint awkward implementation-related issues, perhaps bugs, if the empirical evidence goes against mathematical predictions. Finally, it may assure us that the model for a new variant of mergesort aligns well with empirical data.
Exercise 1.23
The running time of mergesort as implemented in Program 1.1 depends only on the number of elements in the array being sorted, not on their arrangement. Therefore, the standard deviation in the number of compares is zero, since the input size fully determines the cost.
Last updated