> For the complete documentation index, see [llms.txt](https://evarga.gitbook.io/sh-intro-to-algs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://evarga.gitbook.io/sh-intro-to-algs/part-viii-appendix-mathematical-background/appendix-c.md).

# C. Counting and Probability

## Exercises

### C.1-1

We omit empty strings, so $$k>=1$$. An $$n$$-string has $$n-k+1$$ $$k$$-substrings. By the rule of sum, the total number of substrings is

$$
\begin{align\*}
\sum\_{k=1}^n(n-k+1)&=\sum\_{k=1}^n k && \text{(reindexing)} \\
&=\frac{n(n+1)}{2} && \text{(by equation (A.1))}.
\end{align\*}
$$

### C.1-2

We apply the rule of product, where we have $$2^m$$ choices for output per input. In general, the total number of $$n$$-input, $$m$$-output boolean functions is

$$
\underbrace{2^m\cdot2^m\cdot...\cdot2^m}\_{2^n \text{ times}}=\big(2^m\big)^{2^n}.
$$

There are $$2^{2^n}$$ $$n$$-input, 1-output boolean functions.

### C.1-3

If all seatings were unique, we would have $$n!$$ arrangements. A *rotation* can be regarded as an equivalence relation on the set of seatings splitting them into equivalence classes (members are some number of rotations apart from each other). Each class contains $$n$$ seatings and we have  $$\frac{n!}{n}=(n-1)!$$ such classes, which answers in how many ways can $$n$$ professors sit around a circular conference table.

### C.1-4

There are 49 even and 50 odd numbers in the set $${1,2,...,99}$$. A sum of three distinct numbers is even when all numbers are even or one of them is even whilst the others are odd. We can use equation (C.2) together with the rules of sum and product to calculate the answer.

$$
{49 \choose 3}+{49 \choose 1}{50 \choose 2}=78449.
$$

Just for fun, we can also execute the Python 3 script below to verify the result.

```python
from itertools import combinations

print(sum(1 for a, b, c in combinations(range(1, 100), 3) 
            if (a + b + c) % 2 == 0))
```

### C.1-5

$$
{n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{k}\cdot\frac{(n-1)!}{(k-1)!(n-k)!}=\frac{n}{k}{n-1 \choose k-1}.
$$

### C.1-6

$$
{n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{n-k}\cdot\frac{(n-1)!}{k!(n-k-1)!}=\frac{n}{n-k}{n-1 \choose k}.
$$

### C.1-7

To choose $$k$$ elements from $$n$$ elements, we can either decide to use a distinguished element and select the remaining $$k-1$$ items from $$n-1$$ items or skip it and select $$k$$ items from the remaining $$n-1$$ elements. This establishes the desired relationship.

### C.1-8

You can find [this](https://en.wikipedia.org/wiki/Pascal's_triangle) table on Wikipedia.

### C.1-9

We use equations (A.1) and (C.2) to prove the identity.

$$
\sum\_{i=1}^n i=\frac{n(n+1)}{2}=\frac{(n+1)!}{2!(n-1)!}={n+1 \choose 2}.
$$

### C.1-10

Assume $$n>0$$, otherwise the maximum is for $$k=0$$. We fix $$n$$ and treat $${n \choose k}$$ as a function of $$k$$. First. let us calculate the ratio between successive terms

$$
\begin{align\*}
\dfrac{\dbinom{n}{ k}}{\dbinom{n}{k-1}}&=\dfrac{\dfrac{n}{k}\dbinom{n-1 }{k-1}}{\dfrac{n}{n-k+1}\dbinom{n-1}{k-1}} && \text{(by identities from Exercises C.1-5 and C.1-6)} \\\\
&=\frac{n-k+1}{k}.
\end{align\*}
$$

As far as $$n-k+1 \ge k$$, or equivalently $$k \le \frac{n+1}{2}$$, the function $${n \choose k}$$ monotonically increases.&#x20;

If $$n$$ is odd, then we have an even number of terms in the expansion for $$k=0,1,...,n$$. $${n \choose k}$$ is symmetric, so there are 2 candidates (one on each side); $$k=(n+1)/2=\lceil n/2 \rceil$$ or $$k=\lfloor n/2 \rfloor$$.

If n is even, then there is only one maximum $$k=\lfloor(n+1)/2\rfloor=\lfloor n/2 \rfloor=\lceil n/2 \rceil=n/2$$.

### ★ C.1-11

$$
\begin{align\*}
(j+k)!&=j!(j+1)(j+2)\cdots(j+k) \\
&\ge j!(0+1)(0+2)\cdots (0+k) \\
&=j!k!.
\end{align\*}
$$

The inequality above turns into equality when $$j=0 \lor k=0$$.

$$
\begin{align\*}
\binom{n}{j+k}&=\dfrac {n!}{(j+k)!(n-j-k)!} \\
&\le \dfrac {n!}{j!k!(n-j-k)!} \\
&=\dfrac {n!}{j!(n-j)!}\cdot \dfrac{(n-j)!}{k!(n-j-k)!} \\
&=\binom {n}{j}\binom{n-j}{k}.
\end{align\*}
$$

The inequality expresses the number of ways of choosing $$j+k$$ items out of $$n$$ by the rule of product. Each selection of $$j$$ items out $$n$$ is multiplied by the number of ways of choosing additional $$k$$ items out the remaining $$n-j$$.  Nonetheless, this approach counts the same combination multiple times.&#x20;

For example, $$\binom{5}{2+1}=10 < \binom{5}{2}\binom{3}{1}=30$$. Suppose we have chosen $${a,b}$$ from  $${a,b,c,d,e}$$. Apparently, we have 3 options for the additional item. If we pick $$c$$, then we get $${a,b,c}$$. But if we start with another initial selection, like $${a,c}$$, then we don't have anymore 3 choices for the next item. If we pick $$b$$, then we will overshoot.

### ★ C.1-12

We prove the statement by induction on $$k$$. The base case $$k=0$$ is true, provided that for convenience we assume $$0^0=1$$, as mentioned in the book.

For $$k>0$$ we have

$$
\begin{align\*}
\binom{n}{k} &= \frac{n}{k}\binom{n-1}{k-1} && \text{(by Exercise C.1-5)} \\
&= \frac{n}{k}\cdot\frac{n-k+1}{n}\binom{n}{k-1} && \text{(by Exercise C.1-6)} \\
&\le \frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} && \text{(by the inductive hypothesis)}.
\end{align\*}
$$

We prove that

$$
\frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} \le \frac{n^n}{k^k(n-k)^{n-k}}
$$

by examining the ratio

$$
\begin{align\*}
\cfrac{\dfrac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}}}{\dfrac{n^n}{k^k(n-k)^{n-k}}} &= \cfrac{k^{k-1}(n-k)^{n-k}}{(k-1)^{k-1}(n-k+1)^{n-k}} \\
&= \frac{\left(\dfrac{k}{k-1}\right)^{k-1}}{\left(\dfrac{n-k+1}{n-k}\right)^{n-k}} \\
&= \frac{\left(1+\dfrac{1}{k-1}\right)^{k-1}}{\left(1+\dfrac{1}{n-k}\right)^{n-k}}.
\end{align\*}
$$

The sequence $$a\_n={(1+1/n)}^n$$ is strictly increasing (see the proof [here](https://math.stackexchange.com/a/167869/753413)) and we need to find the threshold below which the ratio is less than or equal to one. This happens while $$k-1\le n-k$$, or equivalently for all integers $$k$$ such that $$k\le(n+1)/2$$. According to the principles of mathematical induction we can conclude that inequality (C.7) holds for all integers $$k$$ such that $$0 \le k \le n/2 < (n+1)/2$$.

When  $$n/2 \le k \le n$$ then $$0 \le n-k \le n/2$$, so we can apply equation (C.3) after substituting $$k$$ with $$n-k$$ in inequality (C.7). Therefore, we can conclude that inequality (C.7) holds for all integers $$k$$ such that $$0 \le k \le n$$.

### ★ C.1-13

{% hint style="danger" %}
This exercises is flawed, since $$\binom{2n}{n} \neq \cfrac {2^{2n}}{\sqrt{\pi n}}(1+O(1/n))$$. The reasons will become clear once we prove that $$\binom{2n}{n}=\cfrac {2^{2n}}{\sqrt{\pi n}}(1+O'(1/n))$$ (see [Problem 3-6 part (d)](/sh-intro-to-algs/part-i-foundations/3.-characterizing-running-times.md#d-2)). The asymptotic expansion of the central binomial coefficient is available [here](https://en.wikipedia.org/wiki/Central_binomial_coefficient#Asymptotic_growth).
{% endhint %}

$$
\begin{align\*}
\binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\
&= \cfrac{\sqrt{4\pi n},(2n/e)^{2n}e^{\alpha\_{2n}}}{2\pi n,(n/e)^{2n}e^{2\alpha\_n}} && \text{(by equation (3.29))}\\\[1mm]
&= \cfrac {2^{2n}}{\sqrt{\pi n}} \cdot e^{\alpha\_{2n}-2\alpha\_n}.
\end{align\*}
$$

Let $$x=\alpha\_{2n}-2\alpha\_n$$. As $$n$$ increases $$x \to 0$$, so $$e^x \sim 1+x$$. We can derive equation (3.25) from equation (3.29) by leveraging the same approximation. Equation (3.29) is exact and as $$n$$ increases the adjustment factor $$e^x \to 1\_-$$ instead of $$e^x \to 1^+$$ (you can experiment with small values of $$n$$ to gain intuition). $$x$$ is always negative, $$|x|=O(1/n)$$, hence $$1+x=1-O(1/n)=1+O'(1/n)$$, which concludes the proof.

#### How can things go wrong?

$$
\begin{align\*}
\binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\
&= \cfrac{\sqrt{4\pi n},(2n/e)^{2n}(1+\Theta(1/n))}{2\pi n,(n/e)^{2n}(1+\Theta(1/n))^2} && \text{(by equation (3.25))}.
\end{align\*}
$$

$$\Theta(1/n)$$ stands for a whole family of functions bounded by $$g(n)=1/n$$, so we cannot reduce the numerator and denominator in a usual manner. On the other hand, we can upper bound the fraction by bounding the numerator from above and the denominator from below. Let $$c$$ and $$d$$ represent constants hidden in $$O(1/n)$$ and $$\Omega(1/n)$$ of the numerator and denominator, respectively.

$$
\begin{align\*}
\cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2} &\le \cfrac{1+c/n}{(1+d/n)^2} \\\[1mm]
&< \cfrac{1+c/n}{1+d/n} \\\[1mm]
&= \cfrac{n+c}{n+d} \\
&= 1+\cfrac{c-d}{n+d} \\\[1mm]
&< 1+\cfrac{|c-d|}{n}.
\end{align\*}
$$

So far everything seems correct, thus we may be tempted to jump to the following conclusion

$$
\cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2}= 1+O(1/n)  \space\space\space\space\space\space\textcolor{red}{wrong!}
$$

Below is the citation from the book (see page 58) explaining how to treat asymptotic notations appearing on both sides of an equation:

> No matter how the anonymous functions are chosen on the left of the equal sign, there is a way to choose the anonymous functions on the right of the equal sign to make the equation valid.&#x20;

Pick $$f(n)=1/n$$ as a replacement for $$\Theta(1/n)$$ both in the numerator and denominator. Consequently, the left side becomes $$1/(1+(1/n))<1$$. There is no way to select an asymptotically nonnegative function on the ride side to make it less than 1. The book talks about "proper" abuses of asymptotic notations, but as this example illustrates, it is easy to become a victim of such a convenience.

### ★ C.1-14

We need to calculate the first and second derivative of $$H(\lambda)$$ using the base conversion rule for logarithms $$\lg x=\ln x / \ln 2$$.

$$
\begin{align\*}
H'(\lambda) &= \left(-\lg\lambda-\lambda\cdot\frac{1}{\lambda\ln2}\right)+\left(\lg(1-\lambda)+(1-\lambda)\cdot\frac{1}{(1-\lambda)\ln2}\right) \\\[1mm]
&= -\lg\lambda-\lg e+\lg(1-\lambda)+\lg e \\\[1mm]
&= \lg\frac{1-\lambda}{\lambda},\\\[3mm]
H''(\lambda) &= \frac{\lambda}{(1-\lambda)\ln2}\cdot\frac{-\lambda-(1-\lambda)}{\lambda^2} \\\[1mm]
&= -\frac{\lg e}{\lambda(1-\lambda)}.
\end{align\*}
$$

We have $$H'(1/2)=0 \land H''(1/2)<0$$, so $$H(\lambda)$$ achieves its maximum value at $$\lambda=1/2$$. $$H(1/2)=1$$.

### ★ C.1-15

The equation trivially holds when $$n=0$$. We solve this problem for $$n>0$$ by thinking a bit out-of-the-box, which also demonstrates a useful technique for solving seemingly complicated sums. First couple of preliminaries

$$
\begin{align\*}
\sum\_{k=0}^n\binom{n}{k}k&=\sum\_{k=1}^n\binom{n}{k}k \\
\sum\_{k=1}^n\binom{n}{k}&=2^n-1
\end{align\*}
$$

Draw an $$n \times n$$ table enlisting terms of the sum diagonally, taking into account the multiplicative factor $$k$$, as follows:

<table data-header-hidden data-full-width="false"><thead><tr><th width="55.906646728515625"></th><th width="53.749755859375"></th><th width="51.3927001953125"></th><th width="51.23223876953125"></th><th width="52.613616943359375"></th><th width="52.9315185546875"></th></tr></thead><tbody><tr><td><span class="math">\binom{n}{1}</span></td><td><span class="math">\binom{n}{2}</span></td><td><span class="math">\binom{n}{3}</span></td><td>...</td><td>...</td><td><span class="math">\binom{n}{n}</span></td></tr><tr><td><span class="math">\binom{n}{2}</span></td><td><span class="math">\binom{n}{3}</span></td><td>...</td><td>...</td><td><span class="math">\binom{n}{n}</span></td><td>...</td></tr><tr><td><span class="math">\binom{n}{3}</span></td><td>...</td><td>...</td><td><span class="math">\binom{n}{n}</span></td><td>...</td><td>...</td></tr><tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td><span class="math">\binom{n}{3}</span></td></tr><tr><td>...</td><td><span class="math">\binom{n}{n}</span></td><td>...</td><td>...</td><td><span class="math">\binom{n}{3}</span></td><td><span class="math">\binom{n}{2}</span></td></tr><tr><td><span class="math">\binom{n}{n}</span></td><td>...</td><td>...</td><td><span class="math">\binom{n}{3}</span></td><td><span class="math">\binom{n}{2}</span></td><td><span class="math">\binom{n}{1}</span></td></tr></tbody></table>

Each column sums to $$2^n-1$$, so the total per the whole table is $$n(2^n-1)$$. Observe that the sums of elements in the upper-left and bottom-right triangular sub-matrixes (they both include the diagonal entries) are the same. We have

$$
\begin{align\*}
2S\_\triangle-S\_D&=n(2^n-1) \\
&=n(2^n-1)+n && \text{(} S\_D=n\binom{n}{n}=n \text{)} \\
&=n2^n \\
&\therefore S\_\triangle=n2^{n-1}.
\end{align\*}
$$

We have shown that the equation (C.12) is true for any integer $$n \ge0$$.

### ★ C.1-16

We can easily verify the cases for $$n=0,1,2,3$$ or $$k=0,1$$, so assume $$n \ge 4 \land k>1$$. We have

$$
\begin{align\*}
\binom{n}{k} &= \frac{n(n-1)\cdots(n-(k-1))}{k!} \\
&= \frac{n^k}{k!}\left(1-\frac{1}{n}\right)\cdots\left(1-\frac{k-1}{n}\right) \\
&\ge \frac{n^k}{k!}\left(1-\frac{k-1}{n}\right)^{k-1} \\
&>  \frac{n^k}{k!}\left(1-\frac{\sqrt{n}}{n}\right)^{k-1} && \text{(} k-1<\sqrt{n} \text{)}\\
&>  \frac{n^k}{k!}\left(1-\frac{1}{\sqrt{n}}\right)^{\sqrt{n}} && \text{(} x^n \text{ strictly decreases when } 0\<x< 1 \text{)} \\
&\ge \frac{n^k}{4k!} && \text{(since } (1-1/\sqrt(n))^{\sqrt(n)} \ge 1/4\text{ for all } n \ge 4 \text{)}.
\end{align\*}
$$

### C.2-1

Let event A be that Professor Rosencrantz obtains strictly more heads than Professor Guildenstern. $$R\_*$$ and $$G\_*$$ denote various events about results for Professors Rosencrantz and Guildenstern, respectively. We have&#x20;

$$
\begin{align\*}
A&=(R\_{H\ge1} \cap G\_T) \cup (R\_{H=2} \cap G\_H) \\
\Pr{A}&=\Pr{(R\_{H\ge1} \cap G\_T) \cup (R\_{H=2} \cap G\_H)} \\
&=\Pr{R\_{H\ge1} \cap G\_T}+\Pr{R\_{H=2} \cap G\_H} && \text{(events are mutually exclusive)} \\
&=\Pr{R\_{H\ge1} }Pr{G\_T}+\Pr{R\_{H=2}}\Pr{G\_H} && \text{(events are independent)} \\
&=\frac{3}{4}\cdot\frac{1}{2}+\frac{1}{4}\cdot\frac{1}{2}=\frac{1}{2}.
\end{align\*}
$$

### C.2-2

You can read the proofs [here](https://en.wikipedia.org/wiki/Boole%27s_inequality#Proof_using_induction).

### C.2-3

Let a 3-tuple $$(i,j,k)$$ represent numbers on the selected cards in order of their removal from the deck.  For a fixed $$j$$ all triples having $$i\<j\<k$$ are in sorted (increasing) order; there are $$(j-1)(10-j)$$ such triples. Let $$A\_j$$ denote an event that a triple for a given $$j$$ is in sorted order, and $$A$$ be an event that a triple is in sorted order. We have

$$
\begin{align\*}
A&=\bigcup\_{j=2}^9 A\_j \\
\Pr{A}&=\Pr\Bigg{\bigcup\_{j=2}^9 A\_j \Bigg}\\
&=\sum\_{j=2}^9 \Pr{A\_j} && \text{(events }A\_j \text{ are mutually exclusive)} \\
&=\sum\_{j=2}^9 \frac{(j-1)(10-j)}{\binom{10}{3}3!} \\
&=\frac{120}{720}=\frac{1}{6} && \text{(by equations (A.1), (A.4) and (C.2))}.
\end{align\*}
$$

### C.2-4

$$
\begin{align\*}
\Pr{A\mid B}+\Pr{\overline{A}\mid B} &= \cfrac{\Pr{A\cap B}}{\Pr{B}}+\cfrac{\Pr{\overline{A}\cap B}}{\Pr{B}} && \text{(by equation (C.16))}\\

```
&= \cfrac{\Pr\{B\}}{\Pr\{B\}}  \\
&= 1.
```

\end{align\*}
$$

### C.2-5

We prove the statement by induction on $$n$$. The base case $$n=1$$ trivially holds. For $$n>1$$ we have

$$
\begin{align\*}
\Pr\Bigg{\bigcap\_{i=1}^nA\_i\Bigg} &= \Pr\Bigg{\bigcap\_{i=1}^{n-1}A\_i \cap A\_n\Bigg}  && \text{(by the associativity rule)} \\
&= \Pr\Bigg{\bigcap\_{i=1}^{n-1}A\_i\Bigg}\Pr\Bigg{A\_n\biggm|\bigcap\_{i=1}^{n-1}A\_i\Bigg} && \text{(by equation (C.16))} \\
&= \Pr{A\_1}\Pr{A\_2\mid A\_1}\cdots\Pr\Bigg{A\_n\biggm|\bigcap\_{i=1}^{n-1}A\_i\Bigg} && \text{(by the inductive hypothesis)}.
\end{align\*}
$$

According to the principles of mathematical induction we can conclude that the statement from the book is true for all $$n \ge1$$.

### ★ C.2-6

Create a 2D sample space $$S={s\_{ij}:1\le i,j \le n}$$ with the uniform probability distribution on it. Consequently, $$\Pr{s\_{ij}}=\cfrac{1}{|S|}=\cfrac{1}{n^2}$$.  The idea is to specify events $$A\_i$$ in such a way that $$|A\_i|=n \land |A\_i\cap A\_j|=1 \land |A\_i \cap A\_j \cap A\_k|=0$$ for any indices $$1 \le i\<j\<k \le n$$. Notice that if the probability of any 3-subset of events is zero, then it will be zero for any subset of $$k >3$$ of them.

An event $$A\_m$$ should only comprises outcomes that contain the index $$m$$. This ensures $$A\_i \cap A\_j={s\_{ij}}$$, thus no third event could be simultaneously satisfied. It turns out that $$A\_i={s\_{xi}:1 \le x \le i} \cup {s\_{ix}:i\<x\le n}$$ does the job. We have

$$
\begin{align\*}
\Pr{A\_i}&=\sum\_{s \in A\_i} \Pr{s}=\frac{n}{n^2}=\frac{1}{n} \\
\Pr{A\_i \cap A\_j}&=\Pr{s\_{ij}}=\frac{1}{n^2}=\Pr{A\_i}\Pr{A\_j} \\
\Pr{A\_i \cap A\_j \cap A\_k}&=0\neq\Pr{A\_i}\Pr{A\_j}\Pr{A\_k}.
\end{align\*}
$$

The last line implies that

$$
\Pr\Bigg{\bigcap\_{i=1}^{k\ge3}A\_i\Bigg}=0\neq \prod\_{i=1}^{k\ge3} \Pr{A\_i} =\frac{1}{n^k}.
$$

### ★ C.2-7

Suppose we have two light bulbs—one is operational, and the other is unreliable and works in 50% of the cases. We randomly pick one of the light bulbs and switch it on/off twice in succession. Assume that for a given light bulb, the outcomes of successive attempts to switch it on are independent. Let event A denote a successful first attempt to switch a light bulb on/off, and event B a successful second attempt. Let C be the event that the malfunctioning light bulb was picked. We have

$$
\begin{align\*}
\Pr{A} &= \Pr{B} \\
&= \Pr{B\mid \overline{C}}\Pr{\overline{C}} + \Pr{B\mid C}\Pr{C}\\
&= (1)\cdot(1/2)+(1/2)\cdot (1/2) \\
&= 3/4 \\
\Pr{C} &= 1/2.
\end{align\*}
$$

The events $$A$$ and $$B$$ are not independent, since $$\Pr{A\cap B} = (1)\cdot(1/2)+(1/2)\cdot (1/4)=5/8$$, while $$\Pr{A}\Pr{B}=9/16$$. But events $$A$$ and $$B$$ are conditionally independent given $$C$$.

$$
\begin{align\*}
\Pr{A\cap B\mid C} &= 1/4 \\
&= (1/2)\cdot(1/2) \\
&= \Pr{A\mid C}\Pr{B\mid C}.
\end{align\*}
$$

### ★ C.2-8

Let $$J$$, $$T$$ and $$C$$ be the events that Jeff, Tim and Carmine will pass the course, respectively. Without any additional information, they have the same probability of 1/3 (a.k.a. [prior](https://en.wikipedia.org/wiki/Prior_probability)). Let $$G\_R$$ be the event that Professor Gore will respond anything more specific about exam results. Let $$G\_C$$ be the event that Professor Gore **told Carmine** that Jeff is the one who will fail. Observe that this event contains the notion of whom the above information is revealed.

#### Assumptions

The choice of parameterization highlights a critical aspect of conditional probability problems: the necessity of explicitly defining the decision rules of the agent providing information (here, Professor Gore). The problem does not specify his behavior when both Jeff and Tim fail nor whether he is willing to tell anything more than what he has announced to everybody. Such presumptions affect the probability calculation. For example, Carmine's argument about symmetry, by his claim that Professor Gore *"won’t be revealing any information,"* is only an assumption. We cannot enforce that symmetry on behavior of another agent.

#### Calculation

We have several conditional probabilities regarding what will Professor Gore announce to Carmine based upon outcomes.

$$
\begin{align\*}
\Pr{G\_R \cap G\_C\mid J}&=0 && \text{(1)}\\
\Pr{G\_R \cap G\_C\mid T}&=p && \text{(2)}\\
\Pr{G\_R \cap G\_C\mid C}&=q  && \text{(3)}.
\end{align\*}
$$

* (1) is zero because if Jeff is supposed to pass the course, then Professor Gore will not lie Carmine (a fair assumption).
* (2) It is definitely Jeff who should be selected by Professor Gore, but he may decide to tell this to Carmine with probability $$p$$.
* (3) Professor Gore will pick Jeff with probability $$q$$.

We want to know what is Carmine's probability of passing the course now that he has found out that Jeff will fail. We apply equation (C.20) to get

$$
\begin{align\*}
\Pr{C\mid G\_R \cap G\_C} &= \frac{\Pr{C}\Pr{G\_R \cap G\_C\mid C}}{\Pr{G\_R \cap G\_C}} \\
&= \frac{\Pr{C}\Pr{G\_R \cap G\_C\mid C}}{\Pr{J}\Pr{G\_R \cap G\_C\mid J}+\Pr{T}\Pr{G\_R \cap G\_C\mid T}+\Pr{C}\Pr{G\_R \cap G\_C\mid C}} \\\[2mm]
&= \frac{(1/3)\cdot q}{(1/3)\cdot0+(1/3)\cdot p+(1/3)\cdot q} \\\[1mm]
&= \frac{q}{p+q}.
\end{align\*}
$$

Certainly $$p+q>0$$, since we already have an evidence that he has pointed on Jeff once.

#### Results

The tabs below showcase different results due to variations in the initial conditions. In both cases we fix $$p=1$$.

{% tabs %}
{% tab title="Professor chooses randomly" %}
Here $$q=1/2$$, so Carmine's chance of passing is still 1/3. In the context of similar logic puzzles (e.g., Monty Hall), the intended answer is 1/3, but strictly speaking, it depends on the unknown $$q$$.
{% endtab %}

{% tab title="Professor always names Jeff if he fails" %}
If we set $$q=1$$, then it turns out that Carmine's chance of passing becomes 1/2. In this biased scenario, Professor Gore is leaking out more information about outcomes.
{% endtab %}
{% endtabs %}

### C.3-1

Define the random variable $$X$$ to be the sum of the two values showing on the dice. We have

$$
\begin{align\*}
\mathbb{E}{\[X]} &= \sum\_{x=2}^{12} x\Pr{X=x}  \quad  \text{(by equation (C.23))}\\
&= (1/36)\cdot(2\cdot1+3\cdot2+4\cdot3+5\cdot4+6\cdot5+7\cdot6+8\cdot5+9\cdot4+10\cdot3+11\cdot2+12\cdot1) \\
&= 252/36 \\
&=7.
\end{align\*}
$$

Define the random variable $$X$$ to be the maximum of the two values showing on the dice. We have

$$
\begin{align\*}
\mathbb{E}{\[X]} &= \sum\_{x=1}^6 x\Pr{X=x} \quad \text{(by equation (C.23))}\\
&= (1/36)\cdot(1\cdot1+2\cdot3+3\cdot5+4\cdot7+5\cdot9+6\cdot11) \\
&= 161/36 \\
&\approx 4.47.
\end{align\*}
$$

### C.3-2

Define the random variable $$X$$ to be the index of the maximum/minimum element in the array $$A\[1:n]$$. In a randomly ordered array, containing distinct elements, the maximum/minimum element can be at any index with equal probability. In other words, $$\Pr{X=i}=1/n$$ for any valid index $$i$$. We have

$$
\begin{align\*}
\mathbb{E}{\[X]}  &= \sum\_{i=1}^ni\Pr{X=i} \\
&= \frac{1}{n}\sum\_{i=1}^ni \\\[1mm]
&= \frac{1}{n}\cdot\frac{n(n+1)}{2} \\\[1mm]
&= \frac{n+1}{2}.
\end{align\*}
$$

### C.3-3 🌟

{% hint style="success" %}
Provides a succinct case study (a simplified [chuck-a-luck](https://en.wikipedia.org/wiki/Chuck-a-luck) game) about modelling games of luck using probability theory. Also, shows the composition of random variables.
{% endhint %}

Define the random variable $$X$$ to be the number of times the player's number appears on the three dice. The probability that it will appear on some dice is 1/6. Hence, it is 5/6 that it will not. We assume that dice rolls are independent, meaning that we can multiply together individual probabilities to calculate the joint probability for all 3 dice.

Define the random variable $$Y$$ to be the player's gain (in dollars) from playing the carnival game once. Apparently, $$Y=f(X)$$ where the function $$f$$ maps the number of occurrences of the player's number to monetary value (loss/gain). We have

$$
\begin{align\*}
\mathbb{E}{\[Y]} &= (-1)\cdot\Pr{X=0}+1\cdot\Pr{X=1}+2\cdot\Pr{X=2}+3\cdot\Pr{X=3} \\
&= (1/6^3)\cdot((-1)\cdot\binom{3}{0}\cdot125+1\cdot\binom{3}{1}\cdot25+2\cdot\binom{3}{2}\cdot5+3\cdot\binom{3}{3}\cdot1) \\
&= -17/216 \\
&\approx -0.08.
\end{align\*}
$$

The player will lose about 8 cents on average per game. As a side note, the random variable $$X$$ follows the [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) with $$p=1/6$$.

### C.3-4

For any reals $$x$$ and $$y$$ it holds that $$\min(x,y)+\max(x,y)=x+y$$. Therefore, we have

$$
\begin{align\*}
\min⁡{{X,Y}}+\max⁡{{X,Y}}&=X+Y \\
\mathbb{E}{\[\min⁡{{X,Y}}+\max⁡{{X,Y}}]} &= \mathbb{E}{\[X+Y]} \\
\mathbb{E}{\[\min⁡{{X,Y}}]}+\mathbb{E}{\[\max⁡{{X,Y}}]}&=\mathbb{E}{\[X]}+\mathbb{E}{\[Y]} && \text{(by linearity of expectation)}\\
\mathbb{E}{\[\max⁡{{X,Y}}]} &\le \mathbb{E}{\[X]}+\mathbb{E}{\[Y]} && \text{(} \mathbb{E}{\[\min⁡{{X,Y}}]} \ge 0 \text{)}.
\end{align\*}
$$

### ★ C.3-5 🌟

{% hint style="success" %}
Proves that $$f(X)$$ and $$g(Y)$$ are independent for any choice of functions $$f$$ and $$g$$, for independent discrete random variables $$X$$ and $$Y$$.
{% endhint %}

Suppose $$f(x\_i)=f\_X$$ for all $$i=1,2,\dots,p$$ and $$g(y\_j)=g\_Y$$ for all $$j=1,2,\dots,q$$, where $$x\_i$$ and $$y\_j$$ are values that random variables $$X$$ and $$Y$$ can take, respectively. We have

$$
\begin{align\*}
\Pr{f(X)=f\_X;\text{and};g(Y)=g\_Y} &= \Pr\Bigg{\Bigg(\bigcup\_{i=1}^p X=x\_i\Bigg) \bigcap \Bigg(\bigcup\_{j=1}^q Y=y\_j\Bigg)\Bigg} \\
&= \Pr\Bigg{\bigcup\_{\substack{1 \le i \le p, \ 1 \le j \le q}} (X=x\_i;\text{and};Y=y\_j)\Bigg} \\
&= \sum\_{i=1}^p {\sum\_{j=1}^q {\Pr{X=x\_i;\text{and};Y=y\_j}}} && \text{(mutually exclusive events)} \\
&= \sum\_{i=1}^p {\sum\_{j=1}^q {\Pr{X=x\_i} \Pr{Y=y\_j}}} && \text{(independence of X and Y)} \\
&= \Bigg(\sum\_{i=1}^p {\Pr{X=x\_i}}\Bigg)\Bigg(\sum\_{j=1}^q {\Pr{Y=y\_j}}\Bigg) \\
&= \Pr{f(X)=f\_X}\Pr{g(Y)=g\_Y}.
\end{align\*}
$$

We have not assumed anything about $$f\_X$$ and $$g\_Y$$, so our conclusion about independence of $$f(X)$$ and $$g(Y)$$ generalizes to all cases.

### ★ C.3-6

You can find the proof [here](https://en.wikipedia.org/wiki/Markov%27s_inequality#Discrete_case).

### ★ C.3-7 🌟

{% hint style="success" %}
This exercise introduces the notion of [stochastic dominance](https://en.wikipedia.org/wiki/Stochastic_dominance). In our case, we say that $$X$$ is stochastically larger than $$X'$$, denoted as $$X \succcurlyeq X'$$.
{% endhint %}

We fix $$t$$ and define two sets $$A={s\in S:X(s)\ge t}$$ and $$A'={s\in S:X'(s)\ge t}$$. For all $$s \in S$$ it is true that $$s \in A' \implies s \in A$$ (because $$X(s) \ge X'(s)$$), thus $$A' \subseteq A$$. Using the definition of a discrete random variable, we get

$$
\begin{align\*}
\Pr{X\ge t} &= \sum\_{s\in A}\Pr{s} \\
&= \sum\_{s\in A'}\Pr{s}+\sum\_{s\in A-A'}\Pr{s} \\
&\ge \sum\_{s\in A'}\Pr{s} \\
&= \Pr{X'\ge t}.
\end{align\*}
$$

We have not assumed anything about $$t$$, so our conclusion generalizes to all cases.

### C.3-8

The short answer is $$\mathbb{E}{\[X^2]} \ge \mathbb{E}^2{\[X]}$$. We can prove this either by using Jensen's inequality (C.30), after showing that $$f(x)=x^2$$ is a convex function (see below), or by looking at the equation (C.32) and recalling that the variance of a random variable is always nonnegative.

$$
\begin{align\*}
f(\lambda x+(1-\lambda)y) &\le (\lambda f(x)+(1-\lambda)f(y)) \\
(\lambda x+(1-\lambda)y)^2 &\le (\lambda x^2+(1-\lambda)y^2) \\
0 &\le \lambda x^2+(1-\lambda)y^2 - \lambda^2x^2-2\lambda(1-\lambda)xy-(1-\lambda)^2y^2\\
&= \lambda(1-\lambda)x^2+\lambda(1-\lambda)y^2-2\lambda(1-\lambda)xy \\
&= x^2 + y^2 -2xy && \text{(assuming }  0 < \lambda < 1 \text{)}\\
&= (x-y)^2 \quad \checkmark
\end{align\*}
$$

When $$\lambda = 0 \lor \lambda = 1$$ then the inequality is trivially satisfied.

### C.3-9

For an indicator random variable $$X$$ it is true that $$X^2=X$$. Therefore, we have

$$
\begin{align\*}
\mathbb{Var}{\[X]} &= \mathbb{E}{\[X^2]}-\mathbb{E}^2{\[X]} && \text{(by equation (C.32))} \\
&= \mathbb{E}{\[X]}-\mathbb{E}^2{\[X]} \\
&= \mathbb{E}{\[X]}(1-\mathbb{E}{\[X]}) \\
&= \mathbb{E}{\[X]}\mathbb{E}{\[1-X]} && \text{(by linearity of expectation)}.
\end{align\*}
$$

### C.3-10

$$
\begin{align\*}
\mathbb{Var}{\[aX]} &= \mathbb{E}{\[a^2X^2]}-\mathbb{E}^2{\[aX]} \\
&= a^2\mathbb{E}{\[X^2]}-a^2\mathbb{E}^2{\[X]} && \text{(by equation (C.25))} \\
&= a^2\left(\mathbb{E}{\[X^2]}-\mathbb{E}^2{\[X]}\right) \\
&= a^2\mathbb{Var}{\[X]}.
\end{align\*}
$$

### C.4-1

$$
\begin{align\*}
\sum\_{k=1}^\infty\Pr{X=k} &= \sum\_{k=1}^\infty q^{k-1}p \\
&= p\sum\_{k=0}^\infty q^k \\
&=p \cdot\frac{1}{1-q} && \text{(by equation (A.7))} \\\[1mm]
&= 1 && \text{(} p=1-q \text{)}.
\end{align\*}
$$

### C.4-2

Define the random variable $$X$$ to be the number of trials needed to obtain a success. The probability of a success is $$p=b(3;6,1/2)=\binom{6}{3}(1/2)^3(1/2)^3=20/64=5/16$$. According to equation (C.36) $$\mathbb{E}{\[X]}=1/p=3.2$$. Therefore, on average, we need to ﬂip $$\approx 3$$ times six fair coins before obtaining three heads and three tails.

### C.4-3

$$
\begin{align\*}
\mathbb{E}{\[X^2]} &= \sum\_{k=1}^\infty k^2\Pr{X=k} \\
&= \sum\_{k=1}^\infty k^2q^{k-1}p \\
&= \frac{p}{q}\sum\_{k=1}^\infty k^2q^k \\\[1mm]
&= \frac{p}{q}\cdot\frac{q(1+q)}{(1-q)^3} && \text{(by Exercise A.1-6)} \\\[1mm]
&= (1+q)/p^2.
\end{align\*}
$$

$$
\begin{align\*}
\mathbb{Var}{\[X]} &= \mathbb{E}{\[X^2]}-\mathbb{E}^2{\[X]} && \text{(by equation (C.36))}\\
&= (1+q)/p^2-1/p^2 \\
&= q/p^2.
\end{align\*}
$$

### C.4-4

$$
\begin{align\*}
b(k;n,p) &= \binom{n}{k}p^kq^{n-k} \\
&= \binom{n}{n-k}q^{n-k}p^k && \text{(by equation (C.3) and } p=1-q\text{)} \\
&= b(n-k;n,q).
\end{align\*}
$$

### C.4-5

The binomial distribution $$b(k;n,p)$$ attains a maximum at integer $$k$$ that lies in the range $$np-q \le k \le (n+1)p$$. As we are looking for an approximation, let $$k=np$$ and assume it is an integer. The remaining steps are purely mechanical. We have

$$
\begin{align\*}
b(np;n,p) &= \binom{n}{np}p^{np}(1-p)^{n-np} \\
&= \frac{n!}{(np)!,(n-np)!},p^{np}(1-p)^{n-np} \\\[1mm]
&= \frac{n!}{(np)!,(nq)!},p^{np}q^{nq} \\
&\sim \frac{\sqrt{2\pi n}\left(\dfrac{n}{e}\right)^n}{\sqrt{2\pi np}\left(\dfrac{np}{e}\right)^{np}\sqrt{2\pi nq}\left(\dfrac{nq}{e}\right)^{nq}},p^{np}q^{nq} && \text{(using Stirling's approximation (3.25))}\\\[6mm]
&= \frac{\left(\dfrac{n}{e}\right)^n\left(\dfrac{e}{np}\right)^{np}\left(\dfrac{e}{nq}\right)^{nq}}{\sqrt{2\pi npq}},p^{np}q^{nq} \\\[2mm]
&=\cfrac{1}{\sqrt{2\pi npq}}.
\end{align\*}
$$

### ★ C.4-6

We need to find $$b(0;n,1/n)=\binom{n}{0}(1/n)^0(1-1/n)^n=(1-1/n)^n$$ when $$n \to \infin$$, so we have

$$
\begin{align\*}
\lim\_{n\to\infty}b(0;n,1/n) &= \lim\_{n\to\infty}(1-1/n)^n \\
&= 1/e && \text{(by equation (3.16))}.
\end{align\*}
$$

We need to find $$b(1;n,1/n)=\binom{n}{1}(1/n)(1-1/n)^{n-1}=(1-1/n)^n/(1-1/n)$$ when $$n \to \infin$$, so we have

$$
\begin{align\*}
\lim\_{n\to\infty}b(1;n,1/n) &= \lim\_{n\to\infty}\frac{(1-1/n)^n}{1-1/n} \\
&= \frac{1/e}{1} && \text{(by equation (3.16))} \\\[1mm]
&= 1/e.
\end{align\*}
$$

### ★ C.4-7

We prove the identity using the [double counting](https://en.wikipedia.org/wiki/Double_counting_\(proof_technique\)) proof technique. The idea is to use different counting approaches to express the same quantity, in this case the probability of professors getting the same number of heads, in two different ways. By equating those expressions we establish the desired relation.

#### Aggregating the coin flips

Consider each experiment as a sequence of $$2n$$ coin flips: the first $$n$$ by Professor Rosencrantz, the last $$n$$ by Professor Guildenstern. There are $$2^{2n}=4^n$$ possible sequences. Let "head" be a success for Rosencrantz and "tail" a success for Guildenstern, as hinted in the book. If Professor Rosencrantz has $$x$$ heads (successes), then he will have $$n-x$$ tails. Similarly, if Professor Guildenstern has this number of tails (successes), then he would have $$n - (n-x) = x$$ heads. Therefore, both get the same number of heads if there are exactly $$n$$ successes in total, which can occur in $$\binom{2n}{n}$$ ways. Thus, the desired probability is $$\binom{2n}{n}/4^n$$.

#### Using random variables

Define $$R$$ and $$G$$ to be random variables that represent the number of heads and tails for Professors Rosencrantz and Guildenstern, respectively. These variables are independent and follow the binomial distribution. We have

$$
\begin{align\*}
\sum\_{k=0}^n\Pr{R=k;\text{and};G=n-k} &= \sum\_{k=0}^n\Pr{R=k}\Pr{G=n-k} \\
&= \sum\_{k=0}^nb(k;n,1/2),b(n-k;n,1/2) \\
&= \sum\_{k=0}^n b(k;n,1/2)^2 && \text {(by Exercise C.4-4 when } p=q\text {)}\\
&= \sum\_{k=0}^n\binom{n}{k}^2\left(\frac{1}{2}\right)^{2k}\left(\frac{1}{2}\right)^{2(n-k)} \\
&= \frac{\sum\_{k=0}^n\binom{n}{k}^2}{4^n}.
\end{align\*}
$$

### ★ C.4-8

As explained in the book, for $$k=\lambda n$$, where $$0 \le \lambda \le 1$$, we can rewrite equation (C.7) as

$$
\binom{n}{\lambda n} \le 2^{nH(\lambda)}
$$

Letting $$\lambda=k/n$$ we obtain

$$
\begin{align\*}
b(k;n,1/2) &= \binom{n}{k}\left(\frac{1}{2}\right)^k\left(\frac{1}{2}\right)^{n-k} \\
&= \binom{n}{(k/n),n}\left(\frac{1}{2}\right)^n \\
&\le \frac{2^{nH(k/n)}}{2^n} \\\[1mm]
&= 2^{nH(k/n)-n}.
\end{align\*}
$$

### ★ C.4-9

{% hint style="info" %}
The solution depicted here is based on the next exercise, so you might want to do that first.
{% endhint %}

Define $$X'$$ to be the random variable denoting the total number of successes in $$n$$ Bernoulli trials, where all trials have the same probability $$p \ge p\_i$$ (for all $$i=1,2,\dots,n$$) of success. According to the next exercise, for $$0 \le k \le n$$, we have $$\Pr{X'\ge k} \ge \Pr{X\ge k}$$. Furthermore, $$X'$$ follows a binomial distribution $$b(k;n,p)$$. This gives

$$
\begin{align\*}
\Pr{X\<k} &= 1-\Pr{X\ge k} \\
&\ge 1-\Pr{X'\ge k} \\
&= \sum\_{i=0}^nb(i;n,p)-\sum\_{i=k}^nb(i;n,p) && \text{(by equation (C.40))} \\
&= \sum\_{i=0}^{k-1}b(i;n,p).
\end{align\*}
$$

### ★ C.4-10 🌟

{% hint style="success" %}
Introduces the concept of [coupling](https://en.wikipedia.org/wiki/Coupling_\(probability\)), a powerful proof technique in probability theory.
{% endhint %}

Define the sample space $$S$$ as sequences of $$n$$ Bernoulli trials, where the $$i$$th trial has a probability $$p\_i$$ of success. The set $$A$$ in this exercise corresponds to one such sequence (outcome in $$S$$). Define $$X\_i$$ to be an indicator random variable associated with the $$i$$th trial in $$A$$ ($$X\_i=1$$ when it is a success, otherwise 0), so the total number of successes in $$A$$ can be represented with the random variable $$X=\sum\_{i=1}^n X\_i$$. We devise an experiment involving trials in $$A$$ to obtain $$n$$ Bernoulli trials in $$A'$$ following the hint from the book. The point is to show that if each trial in one sequence has a probability of success at least as high as the corresponding trial in another sequence, the total successes will be at least as high as in the other sequence.

The hint from the book refers to the coupling technique. The fundamental idea is to introduce coupled random variables that could be used to compare the original random variables having different probability distributions. In our case, we cannot directly relate sets $$A$$ and $$A'$$, as they contain Bernoulli trials with different distributions. However, we can derive a new set of indicator random variables $$X'*i$$, conditionally coupled on $$X\_i$$, such that they faithfully reflect the probability distributions of random trials in $$A'$$. Therefore, the total number of successes in $$A'$$ can be represented with the random variable $$X'=\sum*{i=1}^n X'\_i$$ and we can compare $$X$$ and $$X'$$ using the same sample space.

Define

$$
X'\_i =\begin{cases}
1 & \text{if } X\_i=1, \\
Bernoulli(r\_i) & \text{if } X\_i=0,
\end{cases}
$$

where $$r\_i=\frac{p'\_i-p\_i}{1-p\_i}$$. Notice that $$p\_i<1$$ when $$X\_i=0$$, otherwise $$X\_i$$ would be 1. For the marginal distribution of $$X'\_i$$ we have

$$
\begin{align\*}
\Pr{X'\_i=1} &= \Pr{(X'\_i=1;\text{and};X\_i=1) \cup (X'\_i=1;\text{and};X\_i=0)} \\
&= \Pr{X'\_i=1;\text{and};X\_i=1}+\Pr{X'\_i=1;\text{and};X\_i=0} \\
&= p\_i+r\_i(1-p\_i) \\
&=p'\_i \quad \checkmark
\end{align\*}
$$

By construction, for all $$i$$, we have $$X'\_i \succcurlyeq X\_i$$, hence $$X' \succcurlyeq X$$. Furthermore, the coupling preserves the independence of the trials within each sequence because the construction for each $$i$$ depends only on $$X\_i$$ (and independent randomness for the conditional Bernoulli when $$X\_i=0$$).

According the [Exercise C.3-7](#c.3-7), we have $$\forall s \in S, (X'(s) \ge X(s) \implies \Pr{X' \ge k} \ge \Pr{X \ge k})$$ for $$0 \le k \le n$$. This concludes the proof of the statement from the book.

### ★ C.5-1

We expect to have a greater chance to obtain exactly $$n$$ heads in $$2n$$ flips of a fair coin, since, on average, we would expect equal number of heads and tails. We confirm this by calculating the probabilities. Define $$X\_m$$ to be a random variable denoting the number of heads in $$m > 0$$ flips of a fair coin. Apparently, it has a binomial distribution $$b(k;m,1/2)$$, thus for $$n>1$$ we have

$$
\begin{align\*}
\frac{\Pr{X\_{2n}=n}}{\Pr{X\_n=n}} &= \frac{\binom{2n}{n}\left(\frac{1}{2}\right)^{2n}}{\binom{n}{n}\left(\frac{1}{2}\right)^n} \\\[1mm]
&= \frac{\binom{2n}{n}}{2^n} \\\[1mm]
&= \frac{2^n}{\sqrt{\pi n}},(1+O'(1/n)) \\
&> 1 \quad \checkmark
\end{align\*}
$$

As a side note, when $$n=1$$ then the two probabilities are equal.

### ★ C.5-2

We prove Corollaries (C.6) and (C.7) following the advice from the book; by inverting the roles of successes and failures in equations for the left tail of a binomial distribution.

#### Proof of Corollary C.6

When $$np\<k\<n$$ then $$0\<n-k\<nq$$, hence we have

$$
\begin{align\*}
\Pr{X>k}&=\Pr{\overline{X}\<n-k} \\
&< \frac{(n-k)p}{nq-(n-k)}b(n-k;n,q) && \text{(by "inverted" Theorem C.4)} \\
&= \frac{(n-k)p}{n(1-p)-n+k}b(k;n,p) && \text{(by Exercise C.4-4)} \\
&= \frac{(n-k)p}{k-np}b(k;n,p).
\end{align\*}
$$

#### Proof of Corollary C.7

When $$(np+n)/2\<k\<n$$ then $$0\<n-k\<nq/2$$, so we get

$$
\begin{align\*}
\frac{(n-k)p}{nq-(n-k)} &< \frac{nqp/2}{nq-nq/2} && \text{(by "inverted" Corollary C.5)} \\
&= \frac{nqp/2}{nq/2} \\
&\le 1 && \text{(} p\le 1\text{)} \\
&\therefore \Pr{X>k}\<b(k;n,p).
\end{align\*}
$$

Thus we have

$$
\begin{align\*}
\frac{\Pr{X>k}}{\Pr{X>k-1}} &= \frac{\displaystyle\sum\_{i=k+1}^n b(i;n,p)}{\displaystyle\sum\_{i=k}^n b(i;n,p)} \\
&= \frac{\displaystyle\sum\_{i=k+1}^n b(i;n,p)}{b(k;n,p)+\displaystyle\sum\_{i=k+1}^n b(i;n,p)} \\
&< 1/2.
\end{align\*}
$$

since $$\sum\_{i=k+1}^n b(i;n,p) < b(k;n,p)$$.

### ★ C.5-3

We see that $$p=a/(a+1)$$ and $$q=1-p=1/(a+1)$$, so $$a=p/q$$. Notice also that $$0\<k\<np$$. Define $$X$$ to be a random variable with a binomial distribution $$b(k;n,p)$$. We have

$$
\begin{align\*}
\sum\_{i=0}^{k-1}\binom{n}{i}a^i &= \sum\_{i=0}^{k-1}\binom{n}{i}\left(\frac{p}{q}\right)^i \\\[1mm]
&= \frac{\displaystyle\sum\_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i}}{q^n} \\\[1mm]
&= \frac{\displaystyle\sum\_{i=0}^{k-1}b(i;n,p)}{q^n} \\\[3mm]
&= \frac{\Pr{X\<k}}{q^n}.
\end{align\*}
$$

All conditions for applying Theorem C.4 are satisfied, thus

$$
\begin{align\*}
\sum\_{i=0}^{k-1}\binom{n}{i}a^i &< \frac{kq}{q^n(np-k)},b(k;n,p) \\
&= \frac{k/(a+1)}{(1/(a+1))^n(na/(a+1)-k)},b(k;n,p) \\\[1mm]
&= (a+1)^n,\frac{k}{na-k(a+1)},b(k;n,a/(a+1)).
\end{align\*}
$$

### ★ C.5-4

$$
\begin{align\*}
\sum\_{i=0}^{k-1}p^iq^{n-i} &\le \sum\_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i}  && \Bigg(\binom{n}{i} \ge 1 \text{ for all } 0 \le i \le n \Bigg)\\
&= \sum\_{i=0}^{k-1}b(i;n,p) \\
&< \frac{kq}{np-k},b(k;n,p) && \text{(by Theorem C.4)} \\
&\le \frac{kq}{np-k}\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k} && \text{(by Lemma C.1)}.
\end{align\*}
$$

### ★ C.5-5

We must fit into the form of Theorem C.8, so let us work backwards, starting from the known fact $$r>n-\mu=n-\mathbb{E}{\[X]}=\mathbb{E}{\[n-X]}=\mathbb{E}{\[X']}=\mu'$$. Thus, we have

$$
\begin{align\*}
\Pr{\mu-X\ge r} &= \Pr{(n-\mu')-X\ge r} \\
&= \Pr{X'-\mu'\ge r} \\
&\le \left(\frac{\mu' e}{r}\right)^r \\
&= \left(\frac{(n-\mu)e}{r}\right)^r.
\end{align\*}
$$

For the second part, just replace $$\mu=\mathbb{E}{\[X]}=np$$ in the formula above and simplify $$n-np=nq$$.

### ★ C.5-6

We follow the hint from the book and show that $$p\_ie^{\alpha q\_i}+q\_ie^{-\alpha p\_i} \le e^{\alpha^2!/2}$$ for all nonnegative reals $$p\_i$$, $$q\_i$$ and $$\alpha$$ such that $$p\_i+q\_i=1$$. The left hand side of the inequality is the [moment generating function](https://en.wikipedia.org/wiki/Moment-generating_function) of a real-valued random variable $$Y\_i=X\_i-p\_i$$, where $$X\_i$$ is an indicator random variable taking value of 1 with probability $$p\_i$$, as described in the book. Apparently, $$\mathbb{E}{\[Y\_i]}=0$$ and $$-p\_i \le Y\_i \le q\_i$$ , thus it is bounded. Therefore, we can apply the [Hoeffding's lemma](https://cs229.stanford.edu/extra-notes/hoeffding.pdf), which says

$$
\begin{align\*}
\mathbb{E}{\[e^{\alpha Y\_i}]} &= p\_ie^{\alpha q\_i}+q\_ie^{-\alpha p\_i} \\
&\le e^{\alpha^2(q\_i-(-p\_i))^2/8} \\
&=e^{\alpha^2(q\_i+p\_i)^2/8} \\
&=e^{\alpha^2/8} \\
&\le e^{\alpha^2/2} \quad \checkmark
\end{align\*}
$$

We follow the outline of the proof of Theorem C.8, using the previous inequality instead of (C.49). Thus,

$$
\begin{align\*}
\mathbb{E}{\[e^{\alpha(X-\mu)}]} &= \prod\_{i=1}^n\mathbb{E}{\[e^{\alpha(X\_i-p\_i)}]} \\

```
&= \prod_{i=1}^n(p_ie^{\alpha q_i}+q_ie^{-\alpha p_i}) \\
&\le \prod_{i=1}^ne^{\alpha^2\!/2}  \\
&= \exp(\alpha^2n/2).
```

\end{align\*}
$$

It follows that

$$
\begin{align\*}
\Pr{X-\mu\ge r} &= \Pr{e^{\alpha(X-\mu)}\ge e^{\alpha r}} && \text{(by equation (C.47))} \\
&\le \mathbb{E}{\[e^{\alpha(X-\mu)}]}e^{-\alpha r} && \text{(by inequality (C.48))} \\
&\le \exp(\alpha^2n/2-\alpha r).
\end{align\*}
$$

We have a standard quadratic polynomial in terms of $$\alpha$$ that achieves its minimum value at $$\alpha=r/n$$.  We finally have

$$
\begin{align\*}
\Pr{X-\mu\ge r} &\le \exp\left((r/n)^2n/2-(r/n)r\right) \\
&= e^{-r^2!/{2n}}.
\end{align\*}
$$

### ★ C.5-7

The function $$f(\alpha)=e^{g(\alpha)}$$ is monotonically increasing in terms of $$g(\alpha)=\mu e^\alpha-\alpha r$$, so minimizing $$g(\alpha)$$ also minimizes $$f(\alpha)$$. Therefore, we need to find the first and second derivatives of the the exponent in the right-hand side of inequality (C.51); we have

$$
\begin{align\*}
g'(\alpha) &= \mu e^\alpha-r=0 \implies \alpha\_{min} = \ln (r/\mu) \\
g''(\alpha\_{min}) &= \mu e^{\alpha\_{min}} > 0 \therefore \alpha\_{min} \text{ is minimum, assuming } \mu > 0.
\end{align\*}
$$

## Problems

### C-1 The Monty Hall problem

#### a.

See the solution [here](https://brainstellar.com/puzzles/14). The site contains lots of interesting puzzles worthwhile to study.

#### b.

The first three columns constitute the components of an outcome.

<table data-full-width="false"><thead><tr><th width="118.15194702148438">Initial Pick</th><th width="121.82025146484375">Has Offer?</th><th width="110.88922119140625">Switch?</th><th width="112.29388427734375">Winning?</th><th width="211.19186401367188">Probability</th></tr></thead><tbody><tr><td>Right</td><td>Yes</td><td>Yes</td><td>No</td><td><span class="math">p_{right}p_{switch}/3</span></td></tr><tr><td>Right</td><td>Yes</td><td>No</td><td>Yes</td><td><span class="math">p_{right}(1-p_{switch})/3</span></td></tr><tr><td>Wrong</td><td>Yes</td><td>Yes</td><td>Yes</td><td><span class="math">2p_{wrong}p_{switch}/3</span></td></tr><tr><td>Wrong</td><td>Yes</td><td>No</td><td>No</td><td><span class="math">2p_{wrong}(1-p_{switch})/3</span></td></tr><tr><td>Right</td><td>No</td><td>No</td><td>Yes</td><td><span class="math">(1-p_{right})/3</span></td></tr><tr><td>Wrong</td><td>No</td><td>No</td><td>No</td><td><span class="math">2(1-p_{wrong})/3</span></td></tr></tbody></table>

#### c.

Just sum up the probabilities of the *winning* outcomes.

#### d.

He should minimize the expression from part (c). Therefore, his best choices are $$p\_{right}=1$$ and $$p\_{wrong}=0$$.

#### e.

When $$p\_{switch}=0$$ then Monty cannot influence the player's probability of winning a car, so all possible strategies are optimal for him.

#### f.

We should maximize the expression from part (c). Therefore,

$$
p\_{switch} = \begin{cases}
1 & \text{if } 2p\_{wrong} \ge p\_{right}, \\
0 & \text{otherwise}.
\end{cases}
$$

#### g.

The combination $$p\_{right}=1$$ and $$p\_{wrong}=0$$ minimizes the probability of winning as a function of these variables. In this case, $$p\_{winning}=(1-p\_{switch})/3$$, so setting $$p\_{switch}=0$$ maximizes this probability.

#### h.

Let event $$P$$ designate that our initial pick is correct, event $$O$$ that Monty offers us an opportunity to switch, and event $$W$$ that we win a car. The total probability of getting an offer is

$$
\Pr{O}=\Pr{O|P}\Pr{P}+\Pr{O|\overline{P}}\Pr{\overline{P}}=(p\_{right}+2p\_{wrong})/3.
$$

We know that Monty gives an offer, so $$p\_{right}>0 \lor p\_{wrong} > 0$$, hence $$p\_{right}+2p\_{wrong}>0$$.

Define event $$S$$ to stand for our decision to switch. The joint event of winning a car and having an opportunity to switch is

$$
\begin{align\*}
W\cap O &= \left((P\cap\overline{S})\cup(\overline{P}\cap S)\right)\cap O \\
&= \left((P\cap O)\cap\overline{S}\right)\cup\left((\overline{P}\cap O)\cap S\right).
\end{align\*}
$$

As stated in the problem description, we have no knowledge of Monty's possible motivations or strategies. Consequently, $$S$$ is independent of the other events. Thus,

$$
\begin{align\*}
\Pr{W\mid O} &= \cfrac{\Pr{W\cap O}}{\Pr{O}} \\\[1mm]
&= \cfrac{\Pr{(P\cap O)\cap\overline{S}}+\Pr{(\overline{P}\cap O)\cap S}}{\Pr{O}}  && \text{(mutually exclusive events)} \\\[1mm]
&= \cfrac{\Pr{O\mid P}\Pr{P}\Pr{\overline{S}}+\Pr{O\mid\overline{P}}\Pr{\overline{P}}\Pr{S}}{\Pr{O}} \\\[1mm]
&= \cfrac{p\_{right}(1-p\_{switch})/3+2p\_{wrong}p\_{switch}/3}{(p\_{right}+2p\_{wrong})/3} \\\[1mm]
&= \cfrac{p\_{right}-p\_{right}p\_{switch}+2p\_{wrong}p\_{switch}}{p\_{right}+2p\_{wrong}}.
\end{align\*}
$$

#### i.

Let’s denote the expression (C.52) by $$E$$. When $$p\_{switch}=1/2$$ then $$E=1/2$$. If $$p\_{switch}\neq1/2$$, we want to find values for Monty to make $$E-1/2<0$$. We have

$$
E - 1/2 = \frac{(p\_{right} - p\_{wrong})(1/2 - p\_{switch})}{p\_{right} + 2p\_{wrong}}.
$$

From part (j) we know that the denominator is a positive value. If $$p\_{switch}<1/2$$ then Monty should choose $$p\_{right} < p\_{wrong}$$, else if $$p\_{switch}>1/2$$ then the Monty should select $$p\_{right} > p\_{wrong}$$.

#### j.

In the absence of information regarding the structure of the problem—particularly with respect to Monty’s motivations and strategy in offering the option to switch—the most defensible approach in a binary decision-making context is to rely on a random mechanism, such as a coin toss. A Bernoulli trial of this kind may serve as an effective means of neutralizing more sophisticated strategies employed by other participants to diminish our probability of success.

This problem has illustrated how subtle differences in underlying assumptions can substantially affect the outcome. Moreover, it has shown that probability theory often yields results that diverge from everyday intuition. More broadly, it suggests that individuals are generally not well equipped to reason about uncertainty and random processes.

### C-2 Balls and bins

#### a.

There are $$b$$ choices for each ball resulting in $$b^n$$ ways of putting them into bins.

#### b.

We follow the hint from the book. There are $$(b+n-1)!$$ possible permutations of $$n$$ distinct balls and $$b-1$$ indistinguishable sticks. Sticks essentially partition the balls into bins, possibly allowing empty bins, too. For example, 21||3 says that bin 2 is empty, bin 1 contains the balls 2,1 (in this specific order) and ball 3 is in the last bin. Since the sticks are indistinguishable, we must divide the total number of arrangements with $$(b-1)!$$ to get the result from the book.

#### c.

We again follow the hint from the book. Of the arrangements in part (b), $$n!$$ are repeated if the balls are made identical. Therefore, we have $$\dfrac{(b+n-1)!}{n!(b-1)!}=\binom{b+n-1}{n}$$.

#### d.

Out of $$b$$ bins we select $$n$$ of them that will contain a ball. This is a classical example of counting combinations, so the answer is $$\binom{b}{n}$$.

#### e.

We reuse the idea from part (b) with $$b-1$$ indistinguishable sticks. The only difference is that we now limit the placement (positions) of those sticks to be always between some number of balls. There are $$n-1$$ allowable positions (between any neighboring numbers) for sticks. From these we select $$b-1$$ of them, thus we get $$\binom{n-1}{b-1}$$.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://evarga.gitbook.io/sh-intro-to-algs/part-viii-appendix-mathematical-background/appendix-c.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
