Chapter 1

Exercise 1.1

We start by proving that f(N)f(N) is asymptotically equivalent to NlgNN\lg N. For sufficiently large NN, the lgN\lg N term dominates the constant hidden in O(N)O(N). Therefore, limNf(N)/(NlgN)=1+O(N)/(NlgN)=1\lim_{N \to \infty} |f(N)/(N\lg N)|=|1+O(N)/(N\lg N)|=1 implying f(N)=Θ(NlogN)f(N)=\Theta(N\log N). Notice that we can ignore the base of a logarithm in the Θ\Theta notation.

Exercise 1.2

The book already mentions that there are various strategies to avoid sentinels. For example, on the Algorithms, 4th Edition by RS, KW book’s website, you can find an interesting approach (see the section Creative Problems), that uses a single auxiliary array aux[]. As stated there: "The merge procedure copies the second half of a[] to aux[] in decreasing order and then does the merge back to a[]." This improves performance by eschewing boundary checks on indices. Nonetheless, such a mergesort isn’t stable anymore, which is a key property of this sorting algorithm.

On the same website, you have Merge.java; a standard implementation of the top-down mergesort algorithm without relying on sentinels. As a bonus, it also offers index-based sorting, which avoids moving elements around. Shuffling keys could be expensive if they’re associated with satellite data.

Exercise 1.3

GeeksforGeeks covers 3-way mergesort in great detail.

Below is the driver code to measure execution time. This should replace the original driver code in programs from GeeksforGeeks implementing standard and 3-way mergesort algorithms.

if __name__ == "__main__":
    import random
    import time
    import gc
    import sys

    # safety: increase recursion limit
    sys.setrecursionlimit(100_000)

    TRIALS = 10
    N = 500_000

    times = []

    for t in range(TRIALS):
        trial_num = t + 1
        print(f"Trial {trial_num}/{TRIALS}: generating random list of {N} integers...")
        random.seed(42 + t)  # deterministic per-trial seed for reproducibility
        try:
            arr = [random.randint(0, 10**9) for _ in range(N)]
        except MemoryError:
            print("MemoryError while allocating the list. Reduce N or run on a machine with more memory.")
            break

        gc.collect()
        print(f"Trial {trial_num}: starting...")
        t0 = time.perf_counter()
        try:
            # Change this line to call a different target function.
            three_way_merge_sort(arr, 0, len(arr) - 1)
        except RecursionError as e:
            print(f"Trial {trial_num} failed with RecursionError: {e}")
            break
        except Exception as e:
            print(f"Trial {trial_num} failed with exception: {e}")
            break
        t1 = time.perf_counter()

        elapsed = t1 - t0
        times.append(elapsed)
        print(f"Trial {trial_num} finished in {elapsed:.3f} seconds.")

        del arr
        gc.collect()

    if times:
        avg = sum(times) / len(times)
        print(f"\nRan {len(times)} successful trials. Average: {avg:.3f}s, Min: {min(times):.3f}s, Max: {max(times):.3f}s")

Here are the outputs created on a MacBook Pro 2019 machine:

--- standard mergesort ---
Ran 10 successful trials. Average: 2.260s, Min: 2.152s, Max: 2.426s

--- 3-way mergesort ---
Ran 10 successful trials. Average: 2.149s, Min: 2.031s, Max: 2.456s

3-way mergesort has a lower recursion depth, which may potentially improve performance. It’s a better choice in external sorting scenarios, since it makes fewer passes over datasets typically stored on slow external mediums.

🌟 Exercise 1.4

This exercise introduces a neat trick for indirectly solving recurrences. Often, we can develop a recurrence describing the difference between two consecutive terms, and employ it to solve or prove some property of the original formula.

Using the recurrence (1) from the book, we have

CN+1CN=C(N+1)/2+C(N+1)/2+(N+1)CN/2CN/2N.C_{N+1}-C_N=C_{\lceil (N+1)/2 \rceil}+C_{\lfloor (N+1)/2 \rfloor}+(N+1)-C_{\lceil N/2 \rceil}-C_{\lfloor N/2 \rfloor}-N.

Since (N+1)/2=N/2\lfloor (N+1)/2 \rfloor=\lceil N/2 \rceil and (N+1)/2=N/2+1\lceil (N+1)/2 \rceil=\lfloor N/2 \rfloor +1 we can simplify the previous equation and get

CN+1CN=CN/2+1CN/2+1.C_{N+1}-C_N=C_{\lfloor N/2 \rfloor +1}-C_{\lfloor N/2 \rfloor}+1.

The quantity N/2\lfloor N/2 \rfloor is a positive integer, so we can apply recursively the above expansion by leveraging a useful property of floor and ceiling functions that x/2=x/2\lfloor\lfloor x\rfloor/2 \rfloor=\lfloor x/2 \rfloor. We can carry out this process until the problem size drops to 1, recalling that C1=0C_1=0. Thus, we get

CN+1CN=CN/2+1CN/2+1=CN/4+1CN/4+1+1=C2C1+1+1++1lgN times=2+lgN.\begin{align*} C_{N+1}-C_N &= C_{\lfloor N/2 \rfloor +1}-C_{\lfloor N/2 \rfloor}+1 \\ &= C_{\lfloor N/4 \rfloor +1}-C_{\lfloor N/4 \rfloor}+1+1 \\ &\dots \\ &=C_2-C_1+\underbrace{1+1+\dots+1}_{\lfloor \lg N \rfloor \text{ times}} \\ &= 2+\lfloor \lg N \rfloor. \end{align*}

Let’s express CNC_N as a sum of differences, so we have

CN=(CNCN1)+(CN1CN2)++(C2C1)=(2+lg(N1))+(2+lg(N2))++(2+lg1)=1k<N(lgk+2).\begin{align*} C_N &= (C_N-C_{N-1})+ (C_{N-1} - C_{N-2}) + \dots+ (C_2-C_1) \\ &=(2+\lfloor\lg (N-1)\rfloor)+(2+\lfloor\lg (N-2)\rfloor)+\dots+(2+\lfloor\lg 1\rfloor) \\ &=\sum_{1 \le k <N} (\lfloor\lg k\rfloor+2). \end{align*}

🌟 Exercise 1.5

This exercise showcases an advanced usage of mathematical induction and techniques to manipulate floors and ceilings.

As mentioned in the book, the theorem for a general NN follows from (1) by induction. Let’s use the equation CN=NlgN+N2lgNC_N=N \lceil \lg N \rceil+N-2^{\lceil \lg N \rceil} as the inductive hypothesis. The base case C1=0C_1=0 trivially holds, since lg1=0\lg 1=0. For N>1N>1 we have

CN=N/2lgN/2+N/22lgN/2+N/2lgN/2+N/22lgN/2(by the inductive hypothesis)+N=N/2lgN/2+N/2lgN/2+2N2lgN/22lgN/2(N=N/2+N/2).\begin{align*} C_N&=\lceil N/2 \rceil \lceil \lg {\lceil N/2 \rceil} \rceil+\lceil N/2 \rceil-2^{\lceil \lg {\lceil N/2 \rceil} \rceil} \\ &\hspace{1cm} + \lfloor N/2 \rfloor \lceil \lg {\lfloor N/2 \rfloor} \rceil+\lfloor N/2 \rfloor-2^{\lceil \lg {\lfloor N/2 \rfloor} \rceil} && \text{(by the inductive hypothesis)} \\ &\hspace{1cm} +N \\ &= \lceil N/2 \rceil \lceil \lg {\lceil N/2 \rceil} \rceil+\lfloor N/2 \rfloor \lceil \lg {\lfloor N/2 \rfloor} \rceil+2N-2^{\lceil \lg {\lceil N/2 \rceil} \rceil}-2^{\lceil \lg {\lfloor N/2 \rfloor} \rceil} && \text{($N=\lceil N/2 \rceil+\lfloor N/2 \rfloor$)}. \end{align*}

If NN is even, then N/2=N/2=N/2\lceil N/2 \rceil=\lfloor N/2 \rfloor=N/2, so the last line simplifies to

CN=2(N/2)lg(N/2)+2N22lg(N/2)=NlgNN+2N2lgN(lg(N/2)=lgN1=lgN1)=NlgN+N2lgN.\begin{align*} C_N&=2 \cdot (N/2) \lceil \lg {(N/2)} \rceil+2N- 2\cdot 2^{\lceil \lg {(N/2) } \rceil} \\ &=N \lceil \lg N \rceil-N+2N-2^{\lceil \lg N \rceil} && \text{($\lceil \lg {(N/2)} \rceil=\lceil \lg N-1 \rceil=\lceil \lg N \rceil-1$)} \\ &=N \lceil \lg N \rceil+N-2^{\lceil \lg N \rceil}. && \checkmark \end{align*}

If NN is odd, then N/2=(N+1)/2\lceil N/2 \rceil=(N+1)/2 and N/2=(N1)/2\lfloor N/2 \rfloor=(N-1)/2, so we get

CN=N+12lgN+12+N12lgN12+2N2lgN+122lgN12=N+12lg(N+1)+N12lg(N1)N+2N2lg(N+1)12lg(N1)1=N+12lg(N+1)+N12lg(N1)+N2lg(N+1)12lg(N1)1.\begin{align*} C_N&=\frac{N+1}{2} \bigg\lceil \lg \frac{N+1}{2} \bigg\rceil+\frac{N-1}{2} \bigg\lceil \lg \frac{N-1}{2} \bigg\rceil+2N-2^{\big\lceil \lg \frac{N+1}{2} \big\rceil}-2^{\big\lceil \lg \frac{N-1}{2} \big\rceil} \\[0.4cm] &=\frac{N+1}{2} \lceil \lg (N+1) \rceil + \frac{N-1}{2} \lceil \lg (N-1) \rceil-N+2N-2^{\lceil \lg (N+1) \rceil-1}-2^{\lceil \lg (N-1) \rceil-1} \\[0.4cm] &=\frac{N+1}{2} \lceil \lg (N+1) \rceil + \frac{N-1}{2} \lceil \lg (N-1) \rceil+N-2^{\lceil \lg (N +1)\rceil-1}-2^{\lceil \lg (N-1) \rceil-1}. \end{align*}

We must figure out how to express lg(N+1)\lceil \lg (N+1) \rceil and lg(N1)\lceil \lg (N-1) \rceil. We know that N(2k,2k+1)N \in (2^k,2^{k+1}) for some positive integer kk. Observe thatlg(N+1)=lgN=k+1\lceil \lg (N+1) \rceil=\lceil \lg N\rceil=k+1. If N>2k+1N>2^k+1 thenlg(N1)=lgN=k+1\lceil \lg (N-1) \rceil=\lceil \lg N \rceil=k+1, otherwise lg(N1)=lgN1=k\lceil \lg (N-1) \rceil=\lceil \lg N \rceil-1=k. Therefore, we must differentiate these two sub-cases.

When N>2k+1N>2^k+1 we immediately get

CN=N+12lgN+N12lgN+N2lgN12lgN1=NlgN+N22lgN1=NlgN+N2lgN.    \begin{align*} C_N &=\frac{N+1}{2} \lceil \lg N \rceil + \frac{N-1}{2} \lceil \lg N \rceil+N-2^{\lceil \lg N \rceil-1}-2^{\lceil \lg N \rceil-1} \\ &=N \lceil \lg N \rceil +N-2 \cdot 2^{\lceil \lg N \rceil-1} \\ &= N \lceil \lg N \rceil +N-2^{\lceil \lg N \rceil}. \space \space\space\space\checkmark \end{align*}

If N=2k+1N=2^k+1 observe that (N1)/2=(2k+11)/2=2k1=2lgN2(N-1)/2=(2^k+1-1)/2=2^{k-1}=2^{\lceil \lg N \rceil-2}. Thus, we have

CN=N+12lgN+N12lgN2lgN2+N2lgN12lgN2=NlgN+N2lgN122lgN2=NlgN+N22lgN1=NlgN+N2lgN.    \begin{align*} C_N &=\frac{N+1}{2} \lceil \lg N \rceil + \frac{N-1}{2} \lceil \lg N \rceil-2^{\lceil \lg N \rceil-2}+N-2^{\lceil \lg N \rceil-1}-2^{\lceil \lg N \rceil-2} \\ &=N \lceil \lg N \rceil +N-2^{\lceil \lg N \rceil-1}-2 \cdot 2^{\lceil \lg N \rceil-2} \\ &=N \lceil \lg N \rceil +N-2 \cdot 2^{\lceil \lg N \rceil-1} \\ &=N \lceil \lg N \rceil +N-2^{\lceil \lg N \rceil}. \space \space\space\space\checkmark \end{align*}

All in all, we’ve proved that the inductive hypothesis is true for all N1N \ge1.

Exercise 1.6

We use the following three-way merge procedure in Java instead of the original from Exercise 1.3. It uses sentinels, so counting the number of comparisons inside the loop is straightforward.

static void merge(int arr[], int left, int mid1, int mid2, int right) {  
    // Sizes of three subarrays
    int size1 = mid1 - left + 1;
    int size2 = mid2 - mid1;
    int size3 = right - mid2;
        
    // Temporary arrays for three parts
    int[] leftArr = new int[size1 + 1];
    int[] midArr = new int[size2 + 1];
    int[] rightArr = new int[size3 + 1];
        
    // Copy data to temporary arrays
    for (int i = 0; i < size1; i++) {
        leftArr[i] = arr[left + i];
    }
    for (int i = 0; i < size2; i++) {
        midArr[i] = arr[mid1 + 1 + i];
    }
    for (int i = 0; i < size3; i++) {
        rightArr[i] = arr[mid2 + 1 + i];
    }
      
    // Add sentinels
    leftArr[size1] = INFTY;
    midArr[size2] = INFTY;
    rightArr[size3] = INFTY;
        
    // Merge three sorted subarrays
    for (int index = left, i = 0, j = 0, k = 0; index <= right; index++) {
        // Find the smallest among the three current elements
        // and place it into the merged array
        if (leftArr[i] <= midArr[j]) {
            if (leftArr[i] <= rightArr[k]) {
                arr[index] = leftArr[i++];	
            } else {
                arr[index] = rightArr[k++];
            }
        } else if (midArr[j] <= rightArr[k]) {
            arr[index] = midArr[j++];
        } else {
            arr[index] = rightArr[k++];
        }
    }
}

Each iteration of the main for loop performs 2 comparisons to decipher the minimum among the 3 candidates. The recurrence is

CN=3CN/3+2N  for N2 with C1=C0=0.C_N=3C_{N/3 }+2N \space\space \text{for $N \ge 2$ with $C_1=C_0=0$.}

If we consider the case when NN is a power of 3, and use the same analysis as in the book for the standard mergesort, then we get CN=2Nlog3NC_N=2N\log_3 N.

🌟 Exercise 1.7

Algorithm science is a methodology for developing machine-independent falsifiable hypotheses. This exercise is about the doubling hypothesis. According to Exercise 1.1 we know that tN=cNlgN+dNcNlgNt_N=cN \lg N+dN \sim cN \lg N. We can estimate t2Nt_{2N}, independent of the machine, as follows:

t2N2cNlg2N=2cN+2cNlgN2cN+2tN2tN(1+1/lgN)(cNtN/lgN).\begin{align*} t_{2N} &\approx 2cN\lg {2N} \\ &=2cN+2cN\lg N \\ &\approx2cN+2t_N \\ &\approx2t_N(1+1/\lg N) && \text{($cN\approx t_N/\lg N$)}. \end{align*}

The next Python script shows how well our estimated t2Nt_{2N} reflects the actual t2Nt_{2N}. We can run different experiments by altering the machine-dependent constants cc and dd.

from math import log

# Machine dependent constants in the simulation.
c = 3.5
d = 7.1

def t(n):
    return c * n * log(n, 2) + d * n

def estimate_t_2n(t_n, n):
    return 2 * t_n * (1 + 1 / log(n, 2))

for n in [10, 100, 1000, 10000]:
    t_n = t(n)
    est_t_2n = estimate_t_2n(t_n, n)
    act_t_2n = t(2 * n)
    print(f"n: {n:6},\tEstimated: {est_t_2n:10.2f},\tActual: {act_t_2n:10.2f},\tRel. Error: {abs(est_t_2n - act_t_2n) / act_t_2n:.2%}")

The output shows that the relative error drops quickly as NN increases.

n:     10,	Estimated:     487.28,	Actual:     444.53,	Rel. Error: 9.62%
n:    100,	Estimated:    6984.43,	Actual:    6770.70,	Rel. Error: 3.16%
n:   1000,	Estimated:   92385.37,	Actual:   90960.49,	Rel. Error: 1.57%
n:  10000,	Estimated: 1152826.43,	Actual: 1142139.87,	Rel. Error: 0.94%

Exercise 1.8

Let’s use the Python implementation of mergesort from GeeksforGeeks and replace the driver code with the following snippet:

# Driver code
import time
from math import log
from random import randint

if __name__ == "__main__":
    for n in (1_000_000, 10_000_000):
        arr = [randint(0, 100) for _ in range(n)]

        start_time = time.time()
        mergeSort(arr, 0, len(arr) - 1)
        end_time = time.time()
        t_n = end_time - start_time

        if n == 1_000_000:
            print(f"Actual 1x time: {t_n:.2f} seconds")
            t10x_estimate = 10 * t_n * (1 + log(10, 2) / log(n, 2))
            print(f"Estimated 10x time: {t10x_estimate:.2f} seconds")
        else:
            print(f"Actual 10x time: {t_n:.2f} seconds")
            print(f"Relative Error: {abs(t_n - t10x_estimate) / t_n:.2%}")

Here’s an output on MacBook Pro 2019:

Actual 1x time: 4.05 seconds
Estimated 10x time: 47.20 seconds
Actual 10x time: 45.84 seconds
Relative Error: 2.97%

Apparently, the estimated 10x time closely matches the actual 10x running time.

Exercise 1.9

The program Sort2distinct.java implements this algorithm in Java. Actually, it runs the 3-way partitioning procedure.

Exercise 1.10

Perform an initial linear scan to record one set of indices ii, jj and kk for those three distinct values. Determine the index of the median and exchange it with an element at the beginning of an array. Finally, run the 3-way partitioning procedure from Exercise 1.9.

🌟 Exercise 1.11

Number of compares is a reliable estimator of runtime performance of comparison-based sorting algorithms. The Python script below tests the hypothesis that the running time of mergesort divided by the number of compares that it uses approaches a constant as the problem size increases. The merge procedure uses sentinels to be aligned with the version from the book.

def merge(a, left, mid, right):
    n1 = mid - left + 1
    n2 = right - mid

    # Create temp arrays
    b = [0] * n1
    c = [0] * n2

    # Copy data to temp arrays b[] and c[]
    for i in range(n1):
        b[i] = a[left + i]
    for j in range(n2):
        c[j] = a[mid + 1 + j]

    # Add sentinels
    b.append(float('inf'))
    c.append(float('inf'))

    # Merge the temp arrays back into a[left..right]
    i = j = 0
    for k in range(left, right + 1):
        if b[i] <= c[j]:
            a[k] = b[i]
            i += 1
        else:
            a[k] = c[j]
            j += 1

def merge_sort(a, left, right):
    if left < right:
        mid = (left + right) // 2
        merge_sort(a, left, mid)
        merge_sort(a, mid + 1, right)
        merge(a, left, mid, right)

# Reporting code
import time
from math import log
from random import randint

if __name__ == "__main__":
    print(f"{'n':>7} {'t(n)':>11} {'c(n)':>12} {'t(n)/c(n)':>10}")
    print("-" * 44)
    sz = 1
    for n in range(20):
        sz <<= 1
        a = [randint(0, 100) for _ in range(sz)]

        start_time = time.time()
        merge_sort(a, 0, len(a) - 1)
        end_time = time.time()
        t_n = (end_time - start_time) * 1_000_000
        c_n = sz * log(sz, 2)
        print(f"{sz:>7} {t_n:>11.2f} {c_n:>12.2f} {t_n / c_n:>10.2f}")

The running time is shown in microseconds. The ratio tN/cNt_N/c_N approaches a constant, as the problem size increases; it’s always a power 2, so the values of cNc_N are exact. Interestingly, a similar ratio 0.2\approx 0.2 occurred both on MacBook Pro 2019 and HP laptop running Windows 11.

      n        t(n)         c(n)  t(n)/c(n)
--------------------------------------------
      2        7.87         2.00       3.93
      4       10.97         8.00       1.37
      8       14.07        24.00       0.59
     16       29.33        64.00       0.46
     32       60.08       160.00       0.38
     64      126.84       384.00       0.33
    128      268.22       896.00       0.30
    256      565.05      2048.00       0.28
    512     1359.70      4608.00       0.30
   1024     2971.89     10240.00       0.29
   2048     6124.02     22528.00       0.27
   4096    11742.83     49152.00       0.24
   8192    25410.18    106496.00       0.24
  16384    53179.98    229376.00       0.23
  32768   108177.90    491520.00       0.22
  65536   229867.22   1048576.00       0.22
 131072   469205.86   2228224.00       0.21
 262144   982743.98   4718592.00       0.21
 524288  2103214.74   9961472.00       0.21
1048576  4166778.09  20971520.00       0.20

Exercise 1.12

Let TNT_N denote the cumulative (total) number of compares used by quicksort on all N!N! permutations of NN elements. The average number of comparisons is CN=TN/N!C_N=T_N/N!, so the base cases are T2=6T_2=6 and T1=T0=0T_1=T_0=0. We can derive TNT_N based upon the simplified recurrence of CNC_N from the book for N>2N>2.

NCN=(N+1)CN1+2NN(TNN!)=(N+1)(TN1(N1)!)+2NTN(N1)!=(N+1)TN1(N1)!+2NTN=(N+1)TN1+2N(N1)!=(N+1)TN1+2N!.\begin{align*} NC_N&=(N+1)C_{N-1}+2N \\[0.2cm] N\left(\frac{T_N}{N!}\right) &= (N+1)\left(\frac{T_{N-1}}{(N-1)!}\right)+2N \\[0.4cm] \frac{T_N}{(N-1)!} &= (N+1)\frac{T_{N-1}}{(N-1)!}+2N \\[0.4cm] \therefore T_N &= (N+1)T_{N-1}+2N(N-1)! \\ &= (N+1)T_{N-1}+2N!. \end{align*}

🌟 Exercise 1.13

The invariant that the subarrays left after partitioning a random permutation are themselves both random permutations is quintessential for analysis and performance. The proof below is adopted from Sedgewick’s PhD thesis.

Observe that Program 1.2 only moves the partitioning element (pivot) vv to its final position at the end. Leaving the pivot out of the partitioning process is crucial to ensure that the invariant remains true during the whole process.

Suppose that a permutation of {1,2,,N}\{1,2,\dots,N\} is being partitioned with all such N!N! permutations equally likely. Choose a pivot 1vN1 \le v \le N, so that after partitioning the size of the left subarray becomes v1v-1. Any rearrangement of these v1v-1 elements, out of the possible (v1)!(v-1)!, in the original permutation is mapped to a unique permutation in the left subarray. In other words, we can establish a bijection between these two objects. Since all the original permutations are equally likely, the same is true for permutations in the left subarray. The analogous reasoning applies, independently, for the right subarray. This concludes the proof.

Moving the pivot during the partitioning process, by setting j:=hi+1j:=hi+1, introduces bias into the subarrays (unless the pivot already happens to be in its final position). The probability of having a permutation where the pivot remains in its initial position is quite low. In our case, the first element from the left larger than the pivot will trigger it to move; it tends to fall near the beginning of the left subarray. Therefore, not all positions are equally likely to contain vv.

🌟 Exercise 1.14

We follow the technique from the book to eliminate summation on the RHS. We have

AN=1+2N1jNAj1(for N>0)NAN=N+21jNAj1NAN(N1)AN1=1+2AN1(for N>1, after eliminating summation)NAN=(N+1)AN1+1ANN+1=AN1N+1(N+1)NAN+1N+1=AN1+1N(1(N+1)N=1N1N+1)AN+1N+1=AN1+1N==A1+12(iterating)AN=N.\begin{align*} A_N&=1+\frac{2}{N}\sum_{1\le j \le N}A_{j-1} && \text{(for $N>0$)} \\ NA_N&=N+2\sum_{1\le j \le N}A_{j-1} \\ NA_N-(N-1)A_{N-1}&=1+2A_{N-1} && \text{(for $N>1$, after eliminating summation)} \\ NA_N&=(N+1)A_{N-1}+1 \\ \frac{A_N}{N+1}&=\frac{A_{N-1}}{N}+\frac{1}{(N+1)N} \\ \frac{A_N+1}{N+1}&=\frac{A_{N-1}+1}{N} && \text{$\left(\frac{1}{(N+1)N}=\frac{1}{N}-\frac{1}{N+1}\right)$} \\ \frac{A_N+1}{N+1}&=\frac{A_{N-1}+1}{N}=\dots=\frac{A_1+1}{2} && \text{(iterating)} \\ &\therefore \boxed{A_N=N}. \end{align*}

🌟 Exercise 1.15

The proof is adopted from Sedgewick’s PhD thesis.

WLOG assume that the input array is a random permutation of the set {1,2,,N}\{1,2,\dots,N\} to make the exposition leaner. Finding the average number of exchanges used during the first partitioning stage (before the pointers cross) is a harder task than deriving CNC_N or ANA_N. We have dependencies on both the pivot and arrangement of keys (values to sort). The base cases are B1=B0=0B_1=B_0=0 and the recurrence is

BN=1N1vN{average number of exchanges when the pivot is v}+2N1jNBj1(for N>1).\begin{align*} B_N=&\frac{1}{N}\sum_{1 \le v\le N} \bigg\{\text{average number of exchanges when the pivot is $v$}\bigg \} \\ &+\frac{2}{N}\sum_{1\le j \le N}B_{j-1} && \text{(for $N>1$)}. \end{align*}

Let’s analyze the first term of the sum by looking at the Program 1.2 from the book. Quicksort is parsimonious regarding exchanges; only those elements are swapped that must be swapped. If the pivot is v=a[N]v=a[N], then only those keys among a[v],,a[N1]a[v],\dots,a[N-1] which are smaller than vv are exchanged into the right subarray. Furthermore, each element is exchanged at most once during the whole partitioning process. Let XvX_v be the random variable denoting the number of exchanges when the pivot is vv. The probability mass function of XvX_v is given by

Pr{Xv=k}=(NvNvk)(v1k)(N1Nv)=(Nvk)(v1v1k)(N1v1).\Pr\{X_v=k\}=\frac{\dbinom{N-v}{N-v-k}\dbinom{v-1}{k}}{\dbinom{N-1}{N-v}}=\frac{\dbinom{N-v}{k}\dbinom{v-1}{v-1-k}}{\dbinom{N-1}{v-1}}.

The numerator expresses in how many ways we can select kk smaller elements from the left subarray and use them to form the right subarray. The denominator is just the total number of possible combinations to form the right subarray. As a side note, the right side of the equation follows from applying the identity (nn)=(nnk)\binom{n}{n}=\binom{n}{n-k} and shows symmetry in the problem, i.e., it describes the case when a[1]a[1] is chosen as the pivot. At any rate, k[0,v1]k \in [0,v-1].

The average number of exchanges is the expectation of XvX_v defined as

E[Xv]=k=0v1kPr{Xv=k}=k=1v1k(Nvk)(v1v1k)(N1v1)=Nv(N1v1)k=1v1(Nv1k1)(v1v1k)=Nv(N1v1)k=0v2(Nv1k)(v1v2k)=Nv(N1v1)(N2v2)(by Vandermonde’s convolution)=(Nv)(v1)N1.\begin{align*} \mathbb{E}{[X_v]} &= \sum_{k=0}^{v-1} k\Pr\{X_v=k\} \\[0.5cm] &= \sum_{k=1}^{v-1} {k \cdot \frac{\dbinom{N-v}{k}\dbinom{v-1}{v-1-k}}{\dbinom{N-1}{v-1}} }\\[0.8cm] &= \frac{N-v}{\dbinom{N-1}{v-1}} \sum_{k=1}^{v-1} \binom{N-v-1}{k-1}\binom{v-1}{v-1-k} \\[0.8cm] &= \frac{N-v}{\dbinom{N-1}{v-1}} \sum_{k=0}^{v-2} \binom{N-v-1}{k}\binom{v-1}{v-2-k} \\[0.8cm] &= \frac{N-v}{\dbinom{N-1}{v-1}} \binom{N-2}{v-2} && \text{(by Vandermonde's convolution)} \\[0.8cm] &= \frac{(N-v)(v-1)}{N-1}. \end{align*}

The recurrence for the average number of exchanges is

BN=1N1vNE[Xv]+2N1jNBj1=1N(N1)1vN(Nv)(v1)+2N1jNBj1=1N(N1)(N3)+2N1jNBj1=N26+2N1jNBj1.\begin{align*} B_N=&\frac{1}{N}\sum_{1 \le v\le N} \mathbb{E}{[X_v]}+\frac{2}{N}\sum_{1\le j \le N}B_{j-1} \\ =&\frac{1}{N(N-1)}\sum_{1 \le v\le N} (N-v)(v-1)+\frac{2}{N}\sum_{1\le j \le N}B_{j-1} \\ &= \frac{1}{N(N-1)}\binom{N}{3}+\frac{2}{N}\sum_{1\le j \le N}B_{j-1} \\ &= \bold{\frac{N-2}{6}}+\frac{2}{N}\sum_{1\le j \le N}B_{j-1}. \end{align*}

Exercise 1.16

Let Sk(N)S_k(N) denote the number of subarrays of size k>0k>0 encountered, on the average, when sorting a random array of size NN with quicksort. We have 3 cases:

  1. If N<kN<k then we can’t have a subarray of size kk, so Sk(N)=0S_k(N)=0.

  2. If N=kN=k then Sk(N)=1S_k(N)=1, since only the initial array is of size kk.

  3. If N>kN>k then the initial call is on a subarray whose size is larger than kk, hence we don’t count it. We average the counts from the resulting subproblems. The recurrence is

Sk(N)=2N1jNSk(j1).S_k(N)=\frac{2}{N}\sum_{1\le j \le N}S_k(j-1).

Fast forwarding the exact same steps as in Exercise 1.14, we get

Sk(N)N+1=Sk(N1)N==Sk(k+1)k+2    (for N>k+1).\frac{S_k(N)}{N+1}=\frac{S_k(N-1)}{N}=\dots=\frac{S_k(k+1)}{k+2} \space\space\space\space \text{(for $N>k+1$)}.

Observe that we can’t iterate further than Sk(k+1)S_k(k+1). Nonetheless, we can calculate it using the main recurrence to get Sk(k+1)=2/(k+1)S_k(k+1)=2/(k+1); all base cases are zeros except Sk(k)=1S_k(k)=1. Therefore, we have

Sk(N)=2(N+1)(k+1)(k+2)    (for N>k).S_k(N)=\frac{2(N+1)}{(k+1)(k+2)}\space\space\space\space \text{(for $N>k$)}.

🌟 Exercise 1.17

Fast forwarding the exact same steps as in the proof of Theorem 1.3, we get

CNN+1=CM+1M+2+2M+3kN+11k(for N>M+1)=CM+1M+2+2(HN+1HM+2).\begin{align*} \frac{C_N}{N+1}&=\frac{C_{M+1}}{M+2}+2\sum_{M+3 \le k \le N+1} \frac{1}{k} && \text{(for $N>M+1$)}\\ &=\frac{C_{M+1}}{M+2}+2(H_{N+1}-H_{M+2}). \end{align*}

We can calculate CM+1C_{M+1} using the main recurrence

CM+1=(M+1)+1+2M+11jM+1Cj1=M+2+2M+10jMj(j1)4(reindexing and substituting Cj)=M+2+12(M+1)1jMj(j1)=M+2+12(M+1)(M(M+1)(2M+1)6M(M+1)2)=M+2+M(M1)6.\begin{align*} C_{M+1} &= (M+1)+1+\frac{2}{M+1}\sum_{1 \le j \le M+1} C_{j-1} \\ &= M+2+\frac{2}{M+1}\sum_{0 \le j \le M} \frac{j(j-1)}{4} && \text{(reindexing and substituting $C_j$)} \\ &= M+2+\frac{1}{2(M+1)}\sum_{1 \le j \le M} j(j-1) \\ &= M+2+\frac{1}{2(M+1)}\left(\frac{M(M+1)(2M+1)}{6}-\frac{M(M+1)}{2}\right) \\ &= M+2+\frac{M(M-1)}{6}. \end{align*}

The formula for the average number of compares in quicksort, when using insertion sort to handle small subarrays, is

CN=(N+1)(1+M(M1)6(M+2))+2(N+1)(HN+1HM+2)    (for N>M).C_N=(N+1)\left(1+\frac{M(M-1)}{6(M+2)}\right)+2(N+1)(H_{N+1}-H_{M+2}) \space\space\space\space \text{(for $N>M$)}.

Exercise 1.18

After ignoring small terms (those significantly less than NN) as well as constant multiples of NN (they don’t influence the minimization task), we can arrive at the following approximation

CN2NlnN+(M(M1)6(M+2)2lnM)=f(M)N.C_N\approx 2N\ln N +\underset{=f(M)}{\undergroup{\left(\frac{M(M-1)}{6(M+2)}-2\ln {M}\right)}}N.

We can use a suitable math tool, like WolframAlpha, to solve the minimization problem as well as plot the graph of f(M)f(M). Issue plot m(m-1)/(6(m+2))-2ln(m) for 1<=m<60 to see its shape.

Plot of f(M)f(M) that shows a global minimum around 12.

You can find the minimum by saying m(m-1)/(6(m+2))-2ln(m) minimize for m>1 . It’ll report a global minimum at m12.4m \approx 12.4.

You may use a different approximation, like using lg(M+2)\lg (M+2) instead of lgM\lg M inside f(M)f(M), but the outcomes would be similar. Also, you may want to try the more accurate expression for CNC_N (see the remark in Exercise 2.17).

Exercise 1.19

According to the graph from Exercise 1.18 the threshold is M=49M =49.

Exercise 1.20

Below is the Python script that reports how many keys in a file of N=104N=10^4 random positive integers less than M=106M=10^6 are likely to be equal to some other keys in the file.

import random
import math
from collections import Counter
from statistics import mean, pstdev

def count_colliding_keys(n, m):
    # Generate n random positive integers in [1, m-1].
    arr = [random.randrange(1, m) for _ in range(n)]
    freq = Counter(arr)
    # Count all occurrences of values that appear at least twice.
    return sum(cnt for cnt in freq.values() if cnt >= 2)

def simulate(n=10_000, m=1_000_000, trials=1000, seed=None):
    if seed is not None:
        random.seed(seed)
    results = [count_colliding_keys(n, m) for _ in range(trials)]
    mu = mean(results)
    sigma = pstdev(results)
    ci_half = 1.96 * sigma / math.sqrt(trials)
    analytic = n * (1 - (1 - 1/m) ** (n - 1))
    print(f"n={n}, m={m}, trials={trials}")
    print(f"simulated mean = {mu:.3f}, std = {sigma:.3f}")
    print(f"95% CI for mean = [{mu-ci_half:.3f}, {mu+ci_half:.3f}]")
    print(f"analytic estimate = {analytic:.3f}")

simulate(seed=1)

The output also shows the 95% confidence interval for the mean:

n=10000, m=1000000, trials=1000
simulated mean = 99.542, std = 13.895
95% CI for mean = [98.681, 100.403]
analytic estimate = 99.492

The analytic formula is the same as for the number of people with a shared birthday; replace the number of days dd with MM.

Exercise 1.21

Try changing the relative magnitudes of NN and MM in Exercise 1.20 to see how the number of duplicates increase as MM approaches NN, especially when MNM \ll N. The random permutation model assumes that all elements are distinct and all possible N!N! permutations are equally likely to occur. Nevertheless, even in the presence of many duplicates, Program 1.2 retains good performance. As emphasized in Sedgewick’s paper Quicksort with Equal Keys., the evidence in favor of stopping the scanning pointers on keys equal to the partitioning element in quicksort is overwhelming. This is exactly how partitioning works in Program 1.2.

There are also other schemes to efficiently handle duplicates. The Algorithms, 4th Edition by RS, KW book’s website, has an extensive treatment of quicksort, including variants like fast 3-way and dual-pivot. There’s also background information about entropy optimal sorting, which states the following

Arrays with large numbers of duplicate sort keys arise frequently in applications. In such applications, there’s potential to reduce the time of the sort from linearithmic to linear.

Thus, this is one example where the random permutation model is inaccurate. Of course, the story can go the other way around when partitioning doesn’t account for duplicates. In this case, quicksort may show quadratic runtime.

Exercise 1.22

Essentially, the distribution of keys is "asymptotically" immaterial for mergesort, as it achieves Θ(NlgN)\Theta(N\lg N) runtime in all cases. The sorting numbers specify the worst-case number of comparisons made by mergesort (see also Exercise 1.5). The best case takes about half as many iterations as its worst case. Of course, this last statement doesn’t apply to Program 1.1 (see Exercise 1.23). Thus, something like Table 1.1 isn’t useful to get insight into stochastic fluctuations due to dependence on probabilistic models.

On the other hand, Table 1.1 could validate a model, implementation and/or analysis. It could help in revealing machine-dependent constants in a model (see Exercise 1.11) or pinpoint awkward implementation-related issues, perhaps bugs, if the empirical evidence goes against mathematical predictions. Finally, it may assure us that the model for a new variant of mergesort aligns well with empirical data.

Exercise 1.23

The running time of mergesort as implemented in Program 1.1 depends only on the number of elements in the array being sorted, not on their arrangement. Therefore, the standard deviation in the number of compares is zero, since the input size fully determines the cost.

Last updated