# Chapter 9

## Exercise 9.1-1

**Available in the latest revision of the IM.**

## Exercise 9.1-2

**Available in the latest revision of the IM.**

## Exercise 9.1-3

The minimum number of races it takes to determine the fastest three horses out of 25 is 7. Here is the procedure.

{% stepper %}
{% step %}

### Determine the fastest horse in each group

Organize 5 races to know the fastest horse in each group.
{% endstep %}

{% step %}

### Determine the fastest horse overall

Organize the 6th race to see which horse is the fastest overall. This is one of the three we are looking for.
{% endstep %}

{% step %}

### Determine the other two horses

Using the result of the previous race we can select candidates for the final race. The forth and fifth placed horse, together with their groups, can be immediately eliminated. They cannot be candidates for the second and third place. Furthermore, the rest of the horses from a group of the third placed horse can be excluded, too. We need to take 2 horses from the winner's group and 1 horse from the group of the second placed horse. Now, we have 5 horses for the 7th race. The first and second horse from this race are the additional two we are looking for.
{% endstep %}
{% endstepper %}

## Exercise 9.1-4

Let $$A$$ represent the set of numbers which are potentially the minimum and let $$B$$ denote the set of numbers which are potentially the maximum. Initially, all $$n$$ numbers are potentially either the minimum or maximum, so $$|A|=|B|=n$$. There are two types of comparisons:

1. Between two numbers that are members of both sets.
2. Between two numbers from the same set.

The first type of comparison reduces the size of both $$A$$ and $$B$$ by one. The reason is that a smaller of the two numbers cannot anymore be a member of $$B$$ and the larger of the two numbers cannot anymore remain in $$A$$. The second type of comparison only reduces the matching set by one without having any impact on the other. So, to attain the lower bound we must leverage the first type of comparison as much as possible.

As with proving the upper bound, we process elements in pairs. We can form $$\lfloor n/2 \rfloor$$ pairs, so after the first stage $$|A|=|B|=\lceil n/2 \rceil$$. Observe that these sets can have at most one element in common, which happens when $$n$$ is odd. At any rate, the minimum must be searched among members of $$A$$ and the maximum among those from $$B$$. We already know the lower bound on the amount of comparisons needed, which is $$m-1$$ for a set of $$m$$ elements. This is where we can only use the second type of comparison (see above).

The total number of comparisons required is

$$
\lfloor n/2 \rfloor+2(\lceil n/2 \rceil-1)=n+\lceil n/2 \rceil-2=\lceil 3n/2 \rceil-2
$$

## Exercise 9.2-1

**Available in the latest revision of the IM.**

## Exercise 9.2-2

**Available in the latest revision of the IM.**

## Exercise 9.2-3

In each partitioning the maximum of the current subarray should be selected as a pivot. For example, the first iteration should choose 9, the second 8, and so on. The sequence stops when the last subarray becomes \[0].

## Exercise 9.2-4

**Available in the latest revision of the IM.**

## Exercise 9.3-1

**Available in the latest revision of the IM.**

## Exercise 9.3-2

Let $$n=r-p+1$$, thus $$g=\lfloor n/5 \rfloor \le n/5$$. The subarray $$A\[5g+1:r]$$ may contain at most 4 elements, that should be added to both sides of a partition. The recurrence becomes

$$
T(n) \le T(n/5)+T(7n/10 + 4)+\Theta(n)
$$

The driving function satisfies the polynomial-growth condition, so according to the Akra-Bazzi theorem, replacing $$T(7n/10)$$ with $$T(7n/10+4)$$, where $$4=O(n/\lg^{1+\epsilon}n)$$ for some constant $$\epsilon>0$$ and sufficiently large $$n$$, leaves the asymptotic solution unaffected. Therefore, we still have $$\Theta(n)$$ running time.

## Exercise 9.3-3

**Available in the latest revision of the IM.**

{% hint style="warning" %}
The solution in the IM is inaccurate, since the ith order statistic of a median is $$i=\lceil (r-p+1)/2 \rceil$$.
{% endhint %}

## Exercise 9.3-4

We apply the decision tree model to prove the statement from the book. In the decision tree model:

* Each internal node represents a comparison between two elements.&#x20;
* Each leaf corresponds to a conclusion reached by the algorithm after a sequence of comparisons.&#x20;
* For the algorithm to correctly identify the $$i$$th smallest element $$x$$, every input that leads to a leaf where $$x$$ is output must have $$x$$ as the $$i$$th smallest element.

Assume the algorithm reaches a leaf where it outputs $$x$$ as the $$i$$th smallest element. Suppose, for the sake of contradiction, that there exists an element $$y \neq x$$ such that the relation between $$x$$ and $$y$$ is not determined, either directly or via transitive inference, by the comparisons made along the path to that leaf. Then, there exist two inputs sharing the same path toward $$x$$:

* In one $$y\<x$$
* In the other $$y>x$$

But this is a contradiction. If $$y\<x$$, then the $$i$$th smallest element is $$y$$ given that $$x$$ is the $$i$$th smallest element when $$y>x$$. Hence, the relation between $$x$$ and every other element $$y$$ must be determined by the comparisons already made. No need for performing any additional comparisons.

## Exercise 9.3-5

You can read the solution [here](https://stackoverflow.com/a/481333/4931661).

## Exercise 9.3-6

**Available in the latest revision of the IM.**

## Exercise 9.3-7

**Available in the latest revision of the IM.**

{% hint style="info" %}
Here is a simple proof, without any mathematics as in the IM, that the median of the $$y$$-coordinates minimizes the total spur length. Draw a line of the main pipeline on this median $$y$$-coordinate. Observe that the $$x$$-coordinates have no significance on the result. Form pairs of wells by aligning them on their $$x$$-coordinates (move wells horizontally as needed) such that one is above or at the main pipeline, and the other one is below or at the main pipeline. This is possible, since the main pipeline goes through the median of the $$y$$-coordinates. The optimal total spur length is simply the sum of lengths of segments formed by previous pairs of wells. Moving the main pipeline above or below of any segment would increase the total length. Therefore, the main pipeline must intersect (touch) each segment. If $$n$$ is odd, then it will go through one single well with the median $$y$$-coordinate.
{% endhint %}

## Exercise 9.3-8

**Available in the latest revision of the IM.**

{% hint style="danger" %}
The solution in the IM contains errors regarding order statistics, that are fixed below. Notice that we using a generic expression for $$k$$ irrespective of it's parity.

Find the $$\left(\frac{\lfloor k/2 \rfloor}{k}n\right)$$th order statistic using the `SELECT` procedure. Then recursively find the $$\lfloor k/2 \rfloor$$ quantiles of the elements less than and the $$\lceil k/2 \rceil$$ quantiles of the elements greater than the $$\left(\frac{\lfloor k/2 \rfloor}{k}n\right)$$th order statistic.
{% endhint %}

## Exercise 9.3-9

**Available in the latest revision of the IM.**

## Exercise 9.3-10

**Available in the latest revision of the IM.**

### More Generic LeetCode Example

Below is the solution in C++ for the [problem](https://leetcode.com/problems/median-of-two-sorted-arrays/) associated with this exercise.

```cpp
class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
        // The median must be either in nums1 or nums2.
        double median = findMedianSortedArrays(nums1, nums2, 0, nums1.size() - 1);
        if (isnan(median))
            median = findMedianSortedArrays(nums2, nums1, 0, nums2.size() - 1);
        return median;
    }

private:
    double findMedianSortedArrays(vector<int>& a, vector<int>& b, int low, int high) {
        const int m = a.size();
        const int n = b.size();
        const bool isEven = ((m + n) & 1) == 0;
        const int half = (m + n) >> 1;

        if (n == 0)
            return isEven ? (a[half - 1] + a[half]) / 2.0 : a[half];

        // Use binary search over A to find the median, if possible.
        while (low <= high) {        
            // a[k] is the so-called lower median.
            const int k = (low + high) >> 1;
            // Number of elements in A left of the median.
            int left = k + int(isEven);
            // Index into B immediately left of the median.
            const int q = half - left - 1;
                        
            if (left == half and a[k] <= b.front() or 
                    q >= 0 and q < n and b[q] <= a[k] and (q == n - 1 or a[k] <= b[q + 1]))
                if (isEven) {
                    const double neighborA = (k < m - 1) ? a[k + 1] : INFINITY;
                    const double neighborB = (q < n - 1) ? b[q + 1] : INFINITY;
                    return (a[k] + min(neighborA, neighborB)) / 2.0;
                } else
                    return a[k];
        
            if (q < 0 or q < n - 1 and a[k] > b[q + 1])
                high = k - 1;
            else
                low = k + 1;
        }

        return NAN;
    }
};
```

## Problem 9-1

**Available in the latest revision of the IM.**

## Problem 9-2

**Available in the latest revision of the IM.**

## Problem 9-3

**Available in the latest revision of the IM.**

{% hint style="warning" %}
There are several issues with the solution in the IM, as noted below.

**a.** The left side of the summation is always less than ½, as it should be according to the definition from the book. If $$n$$ is even, then we have $$\sum\_{x\_i\<x\_k}w\_i=\frac{(n/2)-1}{n}<1/2$$, since the lower median is at $$k=n/2$$. If $$n$$ is odd, then $$k=\lceil n/2 \rceil$$, so we have $$\lfloor n/2 \rfloor < n/2$$ elements less than the median. Again the sum is less than ½.

**b.** The reasoning about the stoppage criteria is imprecise. If $$\sum\_{x\_i\<x\_k}w\_i=1/2$$, then we would not have a proper condition for the left side, as being less than ½. Therefore, we must stop as soon as the accumulated sum is equal or greater than ½, rather than wait until it exceeds ½.

**c.** Both in the text and pseudocode the thresholds are imprecise. In the pseudocode, we should have: $$W\_G \le 1/2$$ and $$W\_L \ge 1/2$$. The text should be altered accordingly.
{% endhint %}

## Problem 9-4

**Available in the latest revision of the IM.**

{% hint style="info" %}
The idea in part (a) is essentially simple, as depicted in the following example with array `[5,2,7,6,1,9,3,8,4,0]`, $$n=10$$ and $$i=2$$. The mappings substitute the need to simultaneously apply reorderings in the section containing larger elements from pairwise comparisons (as instructed in the solution from the IM). Such cascading updates are difficult to implement.

```
Level 1: Recursive case
Splitting: 5 2 7 6 1 9 3 8 4 0 -> 9 3 8 6 1 | 5 2 7 4 0
Mappings: (5,9), (2,3), (7,8), (4,6), (0,1)

    Level 2: Recursive case
    Splitting: 5 2 7 4 0 -> 7 4 | 5 2 0
    Mappings: (5,7), (2,4), (0,0)
	
	Level 3: Base case
	Select({5,2,0}, i=2) -> {0,2}
	
    Select({0, 2, 4}, i=2) -> {0,2}
	
Select({0,2,3,1}, i=2) -> {0,1}
Return: 1
```

{% endhint %}

## Problem 9-5

**Available in the latest revision of the IM.**

## Problem 9-6

### a.

We follow the template from the book for $$k=5$$. The number $$g$$ of groups is at most $$n/k$$. There are at least $$\lceil k/2 \rceil(\lfloor g/2 \rfloor +1) \ge \lceil k/2 \rceil g/2$$ elements greater than or equal to the pivot, and at least $$\lceil k/2 \rceil \lceil g/2 \rceil \ge \lceil k/2 \rceil g/2$$ elements less than or equal to the pivot. That leaves at most

$$
kg-\lceil k/2 \rceil g/2 \le n-\frac{\lceil k/2 \rceil}{2k}=n \cdot \frac{2k-\lceil k/2 \rceil}{2k}
$$

elements in the recursive call. The recurrence becomes

$$
T(n) \le T(n/k) + T\left(n \cdot \frac{2k-\lceil k/2 \rceil}{2k}\right)+O(n)
$$

Since $$k$$ is odd, then $$\lceil k/2 \rceil=(k+1)/2$$. We have

$$
\begin{align\*}
T(n) &\le T(n/k) + T\left(n \cdot \frac{4k-(k+1)}{4k}\right)+O(n) \\
&= T(n/k) + T\left(n \cdot \frac{3k-1}{4k}\right)+O(n)
\end{align\*}
$$

We prove that $$T(n)\le cn$$ using the substitution method for some constant $$c>0$$ and $$n>1$$.&#x20;

$$
\begin{align\*}
T(n) &\le c(n/k) + cn \cdot \frac{3k-1}{4k}+O(n) \\
&=cn \cdot \frac{4+3k-1}{4k}+O(n) \\
&=cn \cdot \frac{3k+3}{4k}+O(n) \\
&=cn-cn\left(1- \frac{3k+3}{4k}\right)+O(n) && \text{($k>3 \implies \cfrac{3k+3}{4k} < 1$)} \\
&\le cn
\end{align\*}
$$

We should choose $$c$$ to dominate the constant hidden in the $$O(n)$$. Therefore, $$T(n)=O(n)$$.

### b.

When $$k=3$$ the recurrence becomes $$T(n) \le T(n/3)+T(2n/3)+O(n)$$, which evidently resolves to $$T(n)=O(n\lg n)$$ using the Akra-Bazzi method with $$p=1$$ and a trivial driving function $$f(n)=n$$.

### c.

Read the section 3 about the repeated step algorithm in the attached PDF document (download from [here](https://arxiv.org/abs/1409.3600) on October 2025). It contains both a visual representation of what is going on, explanation how it differs from grouping by 9, and contains an analysis of the running time (important for part (d)).

{% file src="<https://1997161-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlBSM5MLFKTKipGIIw1fs%2Fuploads%2FBdCTCEkCp4LJWGfFWzgb%2F1409.3600v3.pdf?alt=media&token=9d3492b5-5fb0-4fe2-b8cc-5b86003e438e>" %}

{% hint style="info" %}
I highly recommend you to read the full paper. It contains many interesting ideas and variants of the `SELECT` algorithm not mentioned in the book.
{% endhint %}

### d.

See part (c).
