Appendix C

Exercise C.1-1

We omit empty strings, so $k>=1$ . An $n$ -string has $n-k+1$ $k$ -substrings. By the rule of sum the total number of substrings is

\begin{align*} \sum_{k=1}^n(n-k+1)&=\sum_{k=1}^n k && \text{(reindexing)} \\ &=\frac{n(n+1)}{2} && \text{(by equation (A.1))} \end{align*}

Exercise C.1-2

We apply the rule of product, where we have $2^m$ choices for output per input. In general, the total number of $n$ -input, $m$ -output boolean functions is

\underbrace{2^m\cdot2^m\cdot...\cdot2^m}_{2^n \text{ times}}=\big(2^m\big)^{2^n}

There are $2^{2^n}$ $n$ -input, 1-output boolean functions.

Exercise C.1-3

If all seatings were unique, we would have $n!$ arrangements. A rotation can be regarded as an equivalence relation on the set of seatings splitting them into equivalence classes (members are some number of rotations apart from each other). Each class contains $n$ seatings and we have $\frac{n!}{n}=(n-1)!$ such classes, which answers in how many ways can $n$ professors sit around a circular conference table.

Exercise C.1-4

There are 49 even and 50 odd numbers in the set $\{1,2,...,99\}$ . A sum of three distinct numbers is even when all of numbers are even or one of them is even whilst the others are odd. We can use equation (C.2) together with the rule of sum and product to calculate the answer.

{49 \choose 3}+{49 \choose 1}{50 \choose 2}=78449

Just for fun, we can also execute the Python 3 script below to verify the result.

from itertools import combinations

print(sum(1 for a, b, c in combinations(range(1, 100), 3) 
            if (a + b + c) % 2 == 0))

Exercise C.1-5

{n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{k}\cdot\frac{(n-1)!}{(k-1)!(n-k)!}=\frac{n}{k}{n-1 \choose k-1}

Exercise C.1-6

{n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{n-k}\cdot\frac{(n-1)!}{k!(n-k-1)!}=\frac{n}{n-k}{n-1 \choose k}

Exercise C.1-7

To choose $k$ elements from $n$ elements, we can either decide to use a distinguished element and select the remaining $k-1$ items from $n-1$ items or skip it and select $k$ items from the remaining $n-1$ elements. This establishes the desired relationship.

Exercise C.1-8

You can find this table on Wikipedia.

Exercise C.1-9

We use equations (A.1) and (C.2) to prove the identity.

\sum_{i=1}^n i=\frac{n(n+1)}{2}=\frac{(n+1)!}{2!(n-1)!}={n+1 \choose 2}

Exercise C.1-10

Assume $n>0$ , otherwise the maximum is for $k=0$ . We fix $n$ and treat ${n \choose k}$ as a function of $k$ . First. let us calculate the ratio between successive terms

\begin{align*} \dfrac{\dbinom{n}{ k}}{\dbinom{n}{k-1}}&=\dfrac{\dfrac{n}{k}\dbinom{n-1 }{k-1}}{\dfrac{n}{n-k+1}\dbinom{n-1}{k-1}} && \text{(by identities from Exercises C.1-5 and C.1-6)} \\\\ &=\frac{n-k+1}{k} \end{align*}

As far as $n-k+1 \ge k$ , or equivalently $k \le \frac{n+1}{2}$ , the function ${n \choose k}$ monotonically increases.

If $n$ is odd, then we have an even number of terms in the expansion for $k=0,1,...,n$ . ${n \choose k}$ is symmetric, so there are 2 candidates (one on each side); $k=(n+1)/2=\lceil n/2 \rceil$ or $k=\lfloor n/2 \rfloor$ .

If n is even, then there is only one maximum $k=\lfloor(n+1)/2\rfloor=\lfloor n/2 \rfloor=\lceil n/2 \rceil=n/2$ .

Exercise C.1-11

\begin{align*} (j+k)!&=j!(j+1)(j+2)\cdot...\cdot(j+k) \\ &\ge j!(0+1)(0+2)\cdot...\cdot (0+k) \\ &=j!k! \end{align*}

The inequality above turns into equality when $j=0 \lor k=0$ .

\begin{align*} \binom{n}{j+k}&=\dfrac {n!}{(j+k)!(n-j-k)!} \\ &\le \dfrac {n!}{j!k!(n-j-k)!} \\ &=\dfrac {n!}{j!(n-j)!}\cdot \dfrac{(n-j)!}{k!(n-j-k)!} \\ &=\binom {n}{j}\binom{n-j}{k} \end{align*}

The inequality expresses the number of ways of choosing $j+k$ items out of $n$ by rule of product. Each selection of $j$ items out $n$ is multiplied by the number of ways of choosing additional $k$ items out the remaining $n-j$ . Nonetheless, this approach is erroneous as it counts the same combination multiple times.

For example, $\binom{5}{2+1}=10 < \binom{5}{2}\binom{3}{1}=30$ . Suppose we have chosen $\{a,b\}$ from $\{a,b,c,d,e\}$ . Apparently, we have 3 options for the additional item. If we pick $c$ , then we get $\{a,b,c\}$ . But if we start with another initial selection, like $\{a,c\}$ , then we don't have anymore 3 choices for the next item. If we pick $b$ , then we will overshoot.

Exercise C.1-12

We prove the statement by induction on $k$ . The base case $k=0$ is true, provided that for convenience we assume $0^0=1$ , as mentioned in the book.

For $k>0$ we have

\begin{align*} \binom{n}{k} &= \frac{n}{k}\binom{n-1}{k-1} && \text{(by Exercise C.1-5)} \\ &= \frac{n}{k}\cdot\frac{n-k+1}{n}\binom{n}{k-1} && \text{(by Exercise C.1-6)} \\ &\le \frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} && \text{(by the inductive hypothesis)} \end{align*}

We prove that

\frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} \le \frac{n^n}{k^k(n-k)^{n-k}}

by examining the ratio

\begin{align*} \cfrac{\dfrac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}}}{\dfrac{n^n}{k^k(n-k)^{n-k}}} &= \cfrac{k^{k-1}(n-k)^{n-k}}{(k-1)^{k-1}(n-k+1)^{n-k}} \\ &= \frac{\left(\dfrac{k}{k-1}\right)^{k-1}}{\left(\dfrac{n-k+1}{n-k}\right)^{n-k}} \\ &= \frac{\left(1+\dfrac{1}{k-1}\right)^{k-1}}{\left(1+\dfrac{1}{n-k}\right)^{n-k}} \end{align*}

The sequence $a_n={(1+1/n)}^n$ is strictly increasing (see the proof here) and we need to find the threshold below which the ratio is less than or equal to one. This happens while $k-1\le n-k$ , or equivalently for all integers $k$ such that $k\le(n+1)/2$ . According to the principles of mathematical induction we can conclude that inequality (C.7) holds for all integers $k$ such that $0 \le k \le n/2 < (n+1)/2$ .

When $n/2 \le k \le n$ then $0 \le n-k \le n/2$ , so we can apply equation (C.3) after substituting $k$ with $n-k$ in inequality (C.7). Therefore, we can conclude that inequality (C.7) holds for all integers $k$ such that $0 \le k \le n$ .

Exercise C.1-13

This exercises is flawed, since $\binom{2n}{n} \neq \cfrac {2^{2n}}{\sqrt{\pi n}}(1+O(1/n))$ . The reasons will become clear once we prove that $\binom{2n}{n}=\cfrac {2^{2n}}{\sqrt{\pi n}}(1+O'(1/n))$ (see Problem 3-6 part (d)).

\begin{align*} \binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\ &= \cfrac{\sqrt{4\pi n}\,(2n/e)^{2n}e^{\alpha_{2n}}}{2\pi n\,(n/e)^{2n}e^{2\alpha_n}} && \text{(by equation (3.29))}\\[1mm] &= \cfrac {2^{2n}}{\sqrt{\pi n}} \cdot e^{\alpha_{2n}-2\alpha_n} \end{align*}

Let $x=\alpha_{2n}-2\alpha_n$ . As $n$ increases $x \to 0$ , so $e^x \approx 1+x$ . We can derive equation (3.25) from equation (3.29) by leveraging the same approximation. Equation (3.29) is exact and as $n$ increases the adjustment factor $e^x \to 1_-$ instead of $e^x \to 1^+$ (you can experiment with small values of $n$ to gain intuition). $x$ is always negative, $|x|=O(1/n)$ , hence $1+x=1-O(1/n)=1+O'(1/n)$ , which concludes the proof.

How can things go wrong?

\begin{align*} \binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\ &= \cfrac{\sqrt{4\pi n}\,(2n/e)^{2n}(1+\Theta(1/n))}{2\pi n\,(n/e)^{2n}(1+\Theta(1/n))^2} && \text{(by equation (3.25))} \end{align*}

$\Theta(1/n)$ stands for a whole family of functions bounded by $g(n)=1/n$ , so we cannot reduce the numerator and denominator in a usual manner. On the other hand, we can upper bound the fraction by bounding the numerator from above and the denominator from below. Let $c$ and $d$ represent constants hidden in $O(1/n)$ and $\Omega(1/n)$ of the numerator and denominator, respectively.

\begin{align*} \cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2} &\le \cfrac{1+c/n}{(1+d/n)^2} \\[1mm] &< \cfrac{1+c/n}{1+d/n} \\[1mm] &= \cfrac{n+c}{n+d} \\ &= 1+\cfrac{c-d}{n+d} \\[1mm] &< 1+\cfrac{|c-d|}{n} \end{align*}

So far everything seems correct, thus we may be tempted to jump to the following conclusion

\cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2}= 1+O(1/n) \space\space\space\space\space\space\textcolor{red}{wrong!}

Below is the citation from the book (see page 58) explaining how to treat asymptotic notations appearing on both sides of an equation:

No matter how the anonymous functions are chosen on the left of the equal sign, there is a way to choose the anonymous functions on the right of the equal sign to make the equation valid.

Pick $f(n)=1/n$ as a replacement for $\Theta(1/n)$ both in the numerator and denominator. Consequently, the left side becomes $1/(1+(1/n))<1$ . There is no way to select an asymptotically nonnegative function on the ride side to make it less than 1.

The book talks about "proper" abuses of asymptotic notations, but as this example illustrates, it is easy to become a victim of such convenience.

Exercise C.1-14

We need to calculate the first and second derivative of $H(\lambda)$ using the base conversion rule for logarithms $\lg x=\ln x / \ln 2$ .

\begin{align*} H'(\lambda) &= \left(-\lg\lambda-\lambda\cdot\frac{1}{\lambda\ln2}\right)+\left(\lg(1-\lambda)+(1-\lambda)\cdot\frac{1}{(1-\lambda)\ln2}\right) \\[1mm] &= -\lg\lambda-\lg e+\lg(1-\lambda)+\lg e \\[1mm] &= \lg\frac{1-\lambda}{\lambda}\\\\[1mm] H''(\lambda) &= \frac{\lambda}{(1-\lambda)\ln2}\cdot\frac{-\lambda-(1-\lambda)}{\lambda^2} \\[1mm] &= -\frac{\lg e}{\lambda(1-\lambda)} \end{align*}

We have $H'(1/2)=0 \land H''(1/2)<0$ , so $H(\lambda)$ achieves its maximum value at $\lambda=1/2$ . $H(1/2)=1$ .

Exercise C.1-15

The equation trivially holds when $n=0$ . We solve this problem for $n>0$ by thinking a bit out-of-the-box, which also demonstrates a useful technique for solving seemingly complicated sums. First couple of preliminaries

\begin{align*} \sum_{k=0}^n\binom{n}{k}k&=\sum_{k=1}^n\binom{n}{k}k \\ \sum_{k=1}^n\binom{n}{k}&=2^n-1 \end{align*}

Draw an $n \times n$ table enlisting terms of the sum diagonally, taking into account the multiplicative factor $k$ , as follows:

$\binom{n}{1}$

$\binom{n}{2}$

$\binom{n}{3}$

...

$\binom{n}{n}$

$\binom{n}{2}$

$\binom{n}{3}$

...

$\binom{n}{n}$

...

$\binom{n}{3}$

...

$\binom{n}{n}$

...

$\binom{n}{3}$

...

$\binom{n}{n}$

...

$\binom{n}{3}$

$\binom{n}{2}$

$\binom{n}{n}$

...

$\binom{n}{3}$

$\binom{n}{2}$

$\binom{n}{1}$

Each column sums to $2^n-1$ , so the total per the whole table is $n(2^n-1)$ . Observe that the sums of elements in the upper-left and bottom-right triangular sub-matrixes (they both include the diagonal entries) are the same. We have

\begin{align*} 2S_\triangle-S_D&=n(2^n-1) \\ &=n(2^n-1)+n && \text{(} S_D=n\binom{n}{n}=n \text{)} \\ &=n2^n \\ &\therefore S_\triangle=n2^{n-1} \end{align*}

We have shown that the equation (C.12) is true for any integer $n \ge0$ .

Exercise C.1-16

We can easily verify the cases for $n=0,1,2,3$ or $k=0,1$ , so assume $n \ge 4 \land k>1$ . We have

\begin{align*} \binom{n}{k} &= \frac{n(n-1)\cdots(n-(k-1))}{k!} \\ &= \frac{n^k}{k!}\left(1-\frac{1}{n}\right)\cdots\left(1-\frac{k-1}{n}\right) \\ &\ge \frac{n^k}{k!}\left(1-\frac{k-1}{n}\right)^{k-1} \\ &> \frac{n^k}{k!}\left(1-\frac{\sqrt{n}}{n}\right)^{k-1} && \text{(} k-1<\sqrt{n} \text{)}\\ &> \frac{n^k}{k!}\left(1-\frac{1}{\sqrt{n}}\right)^{\sqrt{n}} && \text{(} x^n \text{ strictly decreases when } 0<x< 1 \text{)} \\ &\ge \frac{n^k}{4k!} && \text{(since } (1-1/\sqrt(n))^{\sqrt(n)} \ge 1/4\text{ for all } n \ge 4 \text{)} \end{align*}

Exercise C.2-1

Let event A be that Professor Rosencrantz obtains strictly more heads than Professor Guildenstern. $R_*$ and $G_*$ denote various events about results for Professors Rosencrantz and Guildenstern, respectively. We have

\begin{align*} A&=(R_{H\ge1} \cap G_T) \cup (R_{H=2} \cap G_H) \\ \Pr\{A\}&=\Pr\{(R_{H\ge1} \cap G_T) \cup (R_{H=2} \cap G_H)\} \\ \Pr\{A\}&=\Pr\{R_{H\ge1} \cap G_T\}+\Pr\{R_{H=2} \cap G_H\} && \text{(events are mutually exclusive)} \\ \Pr\{A\}&=\Pr\{R_{H\ge1} \}Pr\{G_T\}+\Pr\{R_{H=2}\}\Pr\{G_H\} && \text{(events are independent)} \\ \Pr\{A\}&=\frac{3}{4}\cdot\frac{1}{2}+\frac{1}{4}\cdot\frac{1}{2}=\frac{1}{2} \end{align*}

Exercise C.2-2

Define a new sequence $B$ inductively as follows.

B_n=\begin{dcases} A_1 &\text{if } n=1, \\ A_n-\bigcup_{i=1}^{n-1} B_i &\text{if } n>1. \end{dcases}

This new sequence has the following desired properties:

By definition, for each $n$ , $B_n \sube A_n$ .
All members of the sequence $B$ are pairwise mutually exclusive. For any distinct indices $1 \le i < j \le n$ it is true that $B_j=A_j-(B_1 \cup \dots \cup B_i\cup\dots\cup B_{j-1})$ , hence $B_i \cap B_j=\empty$ .
For each $n$ , $\bigcup_{i=1}^nB_i=\bigcup_{i=1}^nA_i$ , which we prove by induction on $n$ . The base cases are $n=1 \lor n=2$ and both hold based upon how union of sets is specified. For example, $A \cup B=A \cup (B-A)$ , as duplicates are not allowed. For $n>2$ we have

\begin{align*} \bigcup_{i=1}^nB_i &= \bigcup_{i=1}^{n-1}B_i \cup B_n && \text{(by associativity rule)} \\ &= \bigcup_{i=1}^{n-1}A_i \cup B_n && \text{(by the inductive hypothesis)} \\ &= \bigcup_{i=1}^{n-1}A_i \cup (A_n-\bigcup_{i=1}^{n-1}A_i) && \text{(by the inductive hypothesis)} \\ &= \bigcup_{i=1}^nA_i && \text{(by the base case)} \end{align*}

The last fact implies that $\bigcup_{i=1}^{\infty}B_i=\bigcup_{i=1}^{\infty}A_i$ . Using the generalized 3rd axiom of probability (see the book) we can prove Boole's inequality for any countably inﬁnite sequence of events (simply replace $\infty$ with $n$ for the finite case).

\begin{align*} \Pr\bigg\lbrace\bigcup_{i=1}^{\infty}A_i\bigg\rbrace&=\Pr\bigg\lbrace\bigcup_{i=1}^{\infty}B_i\bigg\rbrace \\ &=\sum_{i=1}^{\infty}\Pr\{B_i\} && \text{(all events are pairwise disjoint)} \\ &\le \sum_{i=1}^{\infty}\Pr\{A_i\} && \text{(} \Pr\{B_i\} \le \Pr\{A_i\} \text{ for all } i \text{)} \end{align*}

Exercise C.2-3

Let a 3-tuple $(i,j,k)$ represent numbers on the selected cards in order of their removal from the deck. For a fixed $j$ all triples having $i<j<k$ are in sorted (increasing) order; there are $(j-1)(10-j)$ such triples. Let $A_j$ denote an event that a triple for a given $j$ is in sorted order, and $A$ be an event that a triple is in sorted order. We have

\begin{align*} A&=\bigcup_{j=2}^9 A_j \\ \Pr\{A\}&=\Pr\Bigg\{\bigcup_{j=2}^9 A_j \Bigg\}\\ &=\sum_{j=2}^9 \Pr\{A_j\} && \text{(events }A_j \text{ are mutually exclusive)} \\ &=\sum_{j=2}^9 \frac{(j-1)(10-j)}{\binom{10}{3}3!} \\ &=\frac{120}{720}=\frac{1}{6} && \text{(by equations (A.1), (A.4) and (C.2))} \end{align*}

Exercise C.2-4

\begin{align*} \Pr\{A\mid B\}+\Pr\{\overline{A}\mid B\} &= \cfrac{\Pr\{A\cap B\}}{\Pr\{B\}}+\cfrac{\Pr\{\overline{A}\cap B\}}{\Pr\{B\}} && \text{(by equation (C.16))}\\ &= \cfrac{\Pr\{(A\cap B)\cup(\overline{A}\cap B)\}}{\Pr\{B\}} && \text{(events in the numerator are mutually disjoint)} \\ &= \cfrac{\Pr\{B\}}{\Pr\{B\}} \\ &= 1 \end{align*}

Exercise C.2-5

We prove the statement by induction on $n$ . The base case $n=1$ trivially holds. For $n>1$ we have

\begin{align*} \Pr\Bigg\{\bigcap_{i=1}^nA_i\Bigg\} &= \Pr\Bigg\{\bigcap_{i=1}^{n-1}A_i \cap A_n\Bigg\} && \text{(by the associativity rule)} \\ &= \Pr\Bigg\{\bigcap_{i=1}^{n-1}A_i\Bigg\}\Pr\Bigg\{A_n\biggm|\bigcap_{i=1}^{n-1}A_i\Bigg\} && \text{(by equation (C.16))} \\ &= \Pr\{A_1\}\Pr\{A_2\mid A_1\}\cdots\Pr\Bigg\{A_n\biggm|\bigcap_{i=1}^{n-1}A_i\Bigg\} && \text{(by the inductive hypothesis)} \end{align*}

According to the principles of mathematical induction we can conclude that the statement from the book is true for all $n \ge1$ .

Exercise C.2-6

Create a 2D sample space $S=\{s_{ij}:1\le i,j \le n\}$ with the uniform probability distribution on it. Consequently, $\Pr\{s_{ij}\}=\cfrac{1}{|S|}=\cfrac{1}{n^2}$ . The idea is to specify events $A_i$ in such a way that $|A_i|=n \land |A_i\cap A_j|=1 \land |A_i \cap A_j \cap A_k|=0$ for any indices $1 \le i<j<k \le n$ . Notice that if the probability of any 3-subset of events is zero, then it will be zero for any subset of $k >3$ of them.

An event $A_m$ should only be consisted of outcomes that contain the index $m$ . This ensures $A_i \cap A_j=\{s_{ij}\}$ , thus no third event could be simultaneously satisfied. It turns out that $A_i=\{s_{xi}:1 \le x \le i\} \cup \{s_{ix}:i<x\le n\}$ does the job. We have

\begin{align*} \Pr\{A_i\}&=\sum_{s \in A_i} \Pr\{s\}=\frac{n}{n^2}=\frac{1}{n} \\ \Pr\{A_i \cap A_j\}&=\Pr\{s_{ij}\}=\frac{1}{n^2}=\Pr\{A_i\}\Pr\{A_j\} \\ \Pr\{A_i \cap A_j \cap A_k\}&=0\neq\Pr\{A_i\}\Pr\{A_j\}\Pr\{A_k\} \end{align*}

The last line implies that

\Pr\Bigg\{\bigcap_{i=1}^{k\ge3}A_i\Bigg\}=0\neq \prod_{i=1}^{k\ge3} \Pr\{A_i\} =\frac{1}{n^k}

Exercise C.2-7

Suppose we have two light bulbs—one is operational, and the other is unreliable and works in 50% of the cases. We randomly pick one of the light bulbs and switch it on/off twice in succession. Let event A denote a successful first attempt to switch a light bulb on/off, and event B a successful second attempt. Let C be the event that the malfunctioning light bulb was picked. We have

\begin{align*} \Pr\{A\} &= \Pr\{B\} \\ &= \Pr\{B\mid \overline{C}\}\Pr\{\overline{C}\} + \Pr\{B\mid C\}\Pr\{C\}\\ &= (1)\cdot(1/2)+(1/2)\cdot (1/2) \\ &= 3/4 \\ \Pr\{C\} &= 1/2 \end{align*}

The events $A$ and $B$ are not independent, since $\Pr\{A\cap B\} = (1)\cdot(1/2)+(1/2)\cdot (1/4)=5/8$ , while $\Pr\{A\}\Pr\{B\}=9/16$ . But events $A$ and $B$ are conditionally independent given $C$ .

\begin{align*} \Pr\{A\cap B\mid C\} &= 1/4 \\ &= (1/2)\cdot(1/2) \\ &= \Pr\{A\mid C\}\Pr\{B\mid C\} \end{align*}

Exercise C.2-8

Let $J$ , $T$ and $C$ be the events that Jeff, Tim and Carmine will pass the course, respectively. Without any additional information they have the same probability of 1/3 (a.k.a. prior). Let $G_R$ be the event that Professor Gore will respond anything more specific about exam results. Let $G_C$ be the event that Professor Gore told Carmine that Jeff is the one who will fail. Observe that this event contains the notion of whom the above information is revealed.

Assumptions

The choice of parameterization highlights a critical aspect of conditional probability problems: the necessity of explicitly defining the decision rules of the agent providing information (here, Professor Gore). The problem does not specify his behavior when both Jeff and Tim fail nor whether he is willing to tell anything more than what he has announced to everybody. Such presumptions affect the probability calculation. For example, Carmine's argument about symmetry, by his claim that Professor Gore "won’t be revealing any information," is only an assumption. We cannot enforce that symmetry on behavior of another agent.

Calculation

We have several conditional probabilities regarding what will Professor Gore announce to Carmine based upon outcomes.

\begin{align} \Pr\{G_R \cap G_C\mid J\}&=0 \\ \Pr\{G_R \cap G_C\mid T\}&=p \\ \Pr\{G_R \cap G_C\mid C\}&=q \end{align}

('1) is zero because if Jeff is supposed to pass the course, then Professor Gore will not lie Carmine (a fair assumption).
(2) It is definitely Jeff who should be selected by Professor Gore, but he may decide not to tell this to Carmine with probability $p$ .
(3) Professor Gore will pick Jeff with probability $q$ .

We want to know what is Carmine's probability of passing the course now that he has found out that Jeff will fail. We apply equation (C.20) to get

\begin{align*} \Pr\{C\mid G_R \cap G_C\} &= \frac{\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}}{\Pr\{G_R \cap G_C\}} \\ &= \frac{\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}}{\Pr\{J\}\Pr\{G_R \cap G_C\mid J\}+\Pr\{T\}\Pr\{G_R \cap G_C\mid T\}+\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}} \\[2mm] &= \frac{(1/3)\cdot q}{(1/3)\cdot0+(1/3)\cdot p+(1/3)\cdot q} \\[1mm] &= \frac{q}{p+q} \end{align*}

Certainly $p+q>0$ , since we already have an evidence that he has pointed on Jeff once.

Results

The tabs below showcase different results due to variations in the initial conditions. In both cases we fix $p=1$ .

Here $q=1/2$ , so Carmine's chance of passing is still 1/3. In the context of similar logic puzzles (e.g., Monty Hall), the intended answer is 1/3, but strictly speaking, it depends on the unknown $q$ (see Problem C-1).

Exercise C.3-1

Define the random variable $X$ to be the sum of the two values showing on the dice. We have

\begin{align*} \mathbb{E}{[X]} &= \sum_{x=2}^{12} x\Pr\{X=x\} \quad \text{(by equation (C.23))}\\ &= (1/36)\cdot(2\cdot1+3\cdot2+4\cdot3+5\cdot4+6\cdot5+7\cdot6+8\cdot5+9\cdot4+10\cdot3+11\cdot2+12\cdot1) \\ &= 252/36 \\ &=7 \end{align*}

Define the random variable $X$ to be the maximum of the two values showing on the dice. We have

\begin{align*} \mathbb{E}{[X]} &= \sum_{x=1}^6 x\Pr\{X=x\} \quad \text{(by equation (C.23))}\\ &= (1/36)\cdot(1\cdot1+2\cdot3+3\cdot5+4\cdot7+5\cdot9+6\cdot11) \\ &= 161/36 \\ &\approx 4.47 \end{align*}

Exercise C.3-2

Define the random variable $X$ to be the index of the maximum/minimum element in the array $A[1:n]$ . In a randomly ordered array, containing distinct elements, the maximum/minimum element can be at any index with equal probability. In other words, $\Pr\{X=i\}=1/n$ for any valid index $i$ . We have

\begin{align*} \mathbb{E}{[X]} &= \sum_{i=1}^ni\Pr\{X=i\} \\ &= \frac{1}{n}\sum_{i=1}^ni \\[1mm] &= \frac{1}{n}\cdot\frac{n(n+1)}{2} \\[1mm] &= \frac{n+1}{2} \end{align*}

Exercise C.3-3

Define the random variable $X$ to be the number of times the player's number appears on the three dice. The probability that it will appear on some dice is 1/6. Hence, it is 5/6 that it will not. We assume that dice rolls are independent, meaning that we can multiply together individual probabilities to calculate the joint probability for all 3 dice.

Define the random variable $Y$ to be the player's gain (in dollars) from playing the carnival game once. Apparently, $Y=f(X)$ where the function $f$ maps the number of occurrences of the player's number to monetary value (loss/gain). We have

\begin{align*} \mathbb{E}{[Y]} &= (-1)\cdot\Pr\{X=0\}+1\cdot\Pr\{X=1\}+2\cdot\Pr\{X=2\}+3\cdot\Pr\{X=3\} \\ &= (1/6^3)\cdot((-1)\cdot\binom{3}{0}\cdot125+1\cdot\binom{3}{1}\cdot25+2\cdot\binom{3}{2}\cdot5+3\cdot\binom{3}{3}\cdot1) \\ &= -17/216 \\ &\approx -0.08 \end{align*}

The player will lose about 8 cents on average per game. As a side note, the random variable $X$ follows the binomial distribution with $p=1/6$ .

Exercise C.3-4

For any reals $x$ and $y$ it holds that $\min(x,y)+\max(x,y)=x+y$ . Therefore, we have

\begin{align*} \min⁡{\{X,Y\}}+\max⁡{\{X,Y\}}&=X+Y \\ \mathbb{E}{[\min⁡{\{X,Y\}}+\max⁡{\{X,Y\}}]} &= \mathbb{E}{[X+Y]} \\ \mathbb{E}{[\min⁡{\{X,Y\}}]}+\mathbb{E}{[\max⁡{\{X,Y\}}]}&=\mathbb{E}{[X]}+\mathbb{E}{[Y]} && \text{(by linearity of expectation)}\\ \mathbb{E}{[\max⁡{\{X,Y\}}]} &\le \mathbb{E}{[X]}+\mathbb{E}{[Y]} && \text{(} \mathbb{E}{[\min⁡{\{X,Y\}}]} \ge 0 \text{)} \end{align*}

Exercise C.3-5

Suppose $f(x_i)=f_X$ for all $i=1,2,\dots,p$ and $g(y_j)=g_Y$ for all $j=1,2,\dots,q$ , where $x_i$ and $y_j$ are values that random variables $X$ and $Y$ can take, respectively. We have

\begin{align*} \Pr\{f(X)=f_X \text{ and } g(Y)=g_Y\} &= \Pr\Bigg\{\Bigg(\bigcup_{i=1}^p X=x_i\Bigg) \text{ and } \Bigg(\bigcup_{j=1}^q Y=y_j\Bigg)\Bigg\} \\ &= \Pr\Bigg\{\bigcup_{\substack{1 \le i \le p, \\ 1 \le j \le q}} (X=x_i \text{ and } Y=y_j)\Bigg\} \\ &= \sum_{i=1}^p {\sum_{j=1}^q {\Pr\{X=x_i \text{ and } Y=y_j\}}} && \text{(mutually disjoint events)} \\ &= \sum_{i=1}^p {\sum_{j=1}^q {\Pr\{X=x_i\} \Pr\{Y=y_j\}}} && \text{(independence of X and Y)} \\ &= \Bigg(\sum_{i=1}^p {\Pr\{X=x_i\}}\Bigg)\Bigg(\sum_{j=1}^q {\Pr\{Y=y_j\}}\Bigg) \\ &= \Pr\{f(X)=f_X\}\Pr\{g(Y)=g_Y\} \end{align*}

We have not assumed anything about $f_X$ and $g_Y$ , so our conclusion about independence of $f(X)$ and $g(Y)$ generalizes to all cases.

Exercise C.3-6

Wikipedia has the proof here.

Exercise C.3-7

We fix $t$ and define two sets $A=\{s\in S:X(s)\ge t\}$ and $A'=\{s\in S:X'(s)\ge t\}$ . For all $s \in S$ it is true that $s \in A' \implies s \in A$ (because $X(s) \ge X'(s)$ ), thus $A' \subseteq A$ . Using the definition of a random variable we get

\begin{align*} \Pr\{X\ge t\} &= \sum_{s\in A}\Pr\{s\} \\ &= \sum_{s\in A'}\Pr\{s\}+\sum_{s\in A-A'}\Pr\{s\} \\ &\ge \sum_{s\in A'}\Pr\{s\} \\ &= \Pr\{X'\ge t\} \end{align*}

We have not assumed anything about $t$ , so our conclusion generalizes to all cases.

This exercise introduces the notion of stochastic dominance. In our case, we say that $X$ is stochastically larger than $X'$ , denoted as $X \succcurlyeq X'$ .

Exercise C.3-8

The short answer is $\mathbb{E}{[X^2]} \ge \mathbb{E}^2{[X[}$ . We can prove this either by using Jensen's inequality (C.30), after showing that $x^2$ is a convex function (see below), or by looking at the equation (C.32) and recalling that the variance of a random variable is always nonnegative.

\begin{align*} f(\lambda x+(1-\lambda)y) &\le (\lambda f(x)+(1-\lambda)f(y)) \\ (\lambda x+(1-\lambda)y)^2 &\le (\lambda x^2+(1-\lambda)y^2) \\ 0 &\le \lambda x^2+(1-\lambda)y^2 - \lambda^2x^2-2\lambda(1-\lambda)xy-(1-\lambda)^2y^2\\ &= \lambda(1-\lambda)x^2+\lambda(1-\lambda)y^2-2\lambda(1-\lambda)xy \\ &= x^2 + y^2 -2xy && \text{(assuming } 0 < \lambda < 1 \text{)}\\ &= (x-y)^2 \quad \checkmark \end{align*}

When $\lambda = 0 \lor \lambda = 1$ then the inequality is trivially satisfied.

Exercise C.3-9

For an indicator random variable $X$ it is true that $X^2=X$ . Therefore, we have

\begin{align*} \mathbb{Var}{[X]} &= \mathbb{E}{[X^2]}-\mathbb{E}^2{[X]} && \text{(by equation (C.32))} \\ &= \mathbb{E}{[X]}-\mathbb{E}^2{[X]} \\ &= \mathbb{E}{[X]}(1-\mathbb{E}{[X]}) \\ &= \mathbb{E}{[X]}\mathbb{E}{[1-X]} && \text{(by linearity of expectation)} \end{align*}

Exercise C.3-10

\begin{align*} \mathbb{Var}{[aX]} &= \mathbb{E}{[a^2X^2]}-\mathbb{E}^2{[aX]} \\ &= a^2\mathbb{E}{[X^2]}-a^2\mathbb{E}^2{[X]} && \text{(by equation (C.25))} \\ &= a^2\left(\mathbb{E}{[X^2]}-\mathbb{E}^2{[X]}\right) \\ &= a^2\mathbb{Var}{[X]} \end{align*}

Exercise C.4-1

\begin{align*} \sum_{k=1}^\infty\Pr\{X=k\} &= \sum_{k=1}^\infty q^{k-1}p \\ &= p\sum_{k=0}^\infty q^k \\ &=p \cdot\frac{1}{1-q} && \text{(by equation (A.7))} \\[1mm] &= 1 \quad \checkmark && \text{(} p=1-q \text{)} \end{align*}

Exercise C.4-2

Define the random variable $X$ to be the number of trials needed to obtain a success. The probability of a success is $p=b(3;6,1/2)=\binom{6}{3}(1/2)^3(1/2)^3=20/64=5/16$ . According to equation (C.36) $\mathbb{E}{[X]}=1/p=3.2$ . Therefore, on average we need to ﬂip $\approx 3$ times six fair coins before obtaining three heads and three tails.

Exercise C.4-3

\begin{align*} \mathbb{E}{[X^2]} &= \sum_{k=1}^\infty k^2\Pr\{X=k\} \\ &= \sum_{k=1}^\infty k^2q^{k-1}p \\ &= \frac{p}{q}\sum_{k=1}^\infty k^2q^k \\[1mm] &= \frac{p}{q}\cdot\frac{q(1+q)}{(1-q)^3} && \text{(by Exercise A.1-6)} \\[1mm] &= (1+q)/p^2 \end{align*}

\begin{align*} \mathbb{Var}{[X]} &= \mathbb{E}{[X^2]}-\mathbb{E}^2{[X]} && \text{(by equation (C.36))}\\ &= (1+q)/p^2-1/p^2 \\ &= q/p^2 \end{align*}

Exercise C.4-4

\begin{align*} b(k;n,p) &= \binom{n}{k}p^kq^{n-k} \\ &= \binom{n}{n-k}q^{n-k}p^k && \text{(by equation (C.3) and } p=1-q\text{)} \\ &= b(n-k;n,q) \end{align*}

Exercise C.4-5

The binomial distribution $b(k;n,p)$ attains a maximum at integer $k$ that lies in the range $np-q \le k \le (n+1)p$ . As we are looking for an approximation, let $k=np$ and assume it is an integer. The remaining steps are purely mechanical. We have

\begin{align*} b(np;n,p) &= \binom{n}{np}p^{np}(1-p)^{n-np} \\ &= \frac{n!}{(np)!\,(n-np)!}\,p^{np}(1-p)^{n-np} \\[1mm] &= \frac{n!}{(np)!\,(nq)!}\,p^{np}q^{nq} \\ &\approx \frac{\sqrt{2\pi n}\left(\dfrac{n}{e}\right)^n}{\sqrt{2\pi np}\left(\dfrac{np}{e}\right)^{np}\sqrt{2\pi nq}\left(\dfrac{nq}{e}\right)^{nq}}\,p^{np}q^{nq} && \text{(using Stirling's approximation (3.25))}\\[6mm] &= \frac{\left(\dfrac{n}{e}\right)^n\left(\dfrac{e}{np}\right)^{np}\left(\dfrac{e}{nq}\right)^{nq}}{\sqrt{2\pi npq}}\,p^{np}q^{nq} \\[2mm] &= \cfrac{1}{p^{np}q^{nq}\sqrt{2\pi npq}}\,p^{np}q^{nq} \\ &=\cfrac{1}{\sqrt{2\pi npq}} \end{align*}

Exercise C.4-6

We need to find $b(0;n,1/n)=\binom{n}{0}(1/n)^0(1-1/n)^n=(1-1/n)^n$ when $n \to \infin$ , so we have

\begin{align*} \lim_{n\to\infty}b(0;n,1/n) &= \lim_{n\to\infty}(1-1/n)^n \\ &= 1/e && \text{(by equation (3.16))} \end{align*}

We need to find $b(1;n,1/n)=\binom{n}{1}(1/n)(1-1/n)^{n-1}=(1-1/n)^n/(1-1/n)$ when $n \to \infin$ , so we have

\begin{align*} \lim_{n\to\infty}b(1;n,1/n) &= \lim_{n\to\infty}\frac{(1-1/n)^n}{1-1/n} \\ &= \frac{1/e}{1} && \text{(by equation (3.16))} \\[1mm] &= 1/e \end{align*}

Exercise C.4-7

We prove the identity using the double counting proof technique. The idea is to use different counting approaches to express the same quantity, in this case the probability of professors getting the same number of heads, in two different ways. By equating those expressions we establish the desired relation.

Aggregating the coin flips

Consider each experiment as a sequence of $2n$ coin flips: the first $n$ by Professor Rosencrantz, the last $n$ by Professor Guildenstern. There are $2^{2n}=4^n$ possible sequences. Let "head" be a success for Rosencrantz and "tail" a success for Guildenstern, as hinted in the book. If Professor Rosencrantz has $x$ heads (successes), then he will have $n-x$ tails. Similarly, if Professor Guildenstern has this number of tails (successes), then he would have $n - (n-x) = x$ heads. Therefore, both get the same number of heads if there are exactly $n$ successes in total, which can occur in $\binom{2n}{n}$ ways. Thus, the desired probability is $\binom{2n}{n}/4^n$ .

Using random variables

Define $R$ and $G$ to be random variables that represent the number of heads and tails for Professors Rosencrantz and Guildenstern, respectively. These variables are independent and follow the binomial distribution. We have

\begin{align*} \sum_{k=0}^n\Pr\{R=k\;\text{and}\;G=n-k\} &= \sum_{k=0}^n\Pr\{R=k\}\Pr\{G=n-k\} \\ &= \sum_{k=0}^nb(k;n,1/2)\,b(n-k;n,1/2) \\ &= \sum_{k=0}^n b(k;n,1/2)^2 && \text {(by Exercise C.4-4 when } p=q\text {)}\\ &= \sum_{k=0}^n\binom{n}{k}^2\left(\frac{1}{2}\right)^{2k}\left(\frac{1}{2}\right)^{2(n-k)} \\ &= \frac{\sum_{k=0}^n\binom{n}{k}^2}{4^n} \end{align*}

Exercise C.4-8

As explained in the book, for $k=\lambda n$ , where $0 \le \lambda \le 1$ , we can rewrite equation (C.7) as

\binom{n}{\lambda n} \le 2^{nH(\lambda)}

Letting $\lambda=k/n$ we obtain

\begin{align*} b(k;n,1/2) &= \binom{n}{k}\left(\frac{1}{2}\right)^k\left(\frac{1}{2}\right)^{n-k} \\ &= \binom{n}{(k/n)\,n}\left(\frac{1}{2}\right)^n \\ &\le \frac{2^{nH(k/n)}}{2^n} \\[1mm] &= 2^{nH(k/n)-n} \end{align*}

Exercise C.4-9

The solution depicted here is based on Exercise C.4-10, so you might want to do that first. Define $X'$ to be the random variable denoting the total number of successes in $n$ Bernoulli trials, where the ith trial has a probability $p \ge p_i$ of success. According to Exercise C.4-10 for $0 \le k \le n$ we have $\Pr\{X'\ge k\} \ge \Pr\{X\ge k\}$ . Furthermore, $X'$ follows a binomial distribution $b(k;n,p)$ and $\Pr\{X'\ge k\}=\sum_{i=k}^nb(i;n,p)$ .

\begin{align*} \Pr\{X<k\} &= 1-\Pr\{X\ge k\} \\ &\ge 1-\Pr\{X'\ge k\} \\ &= \sum_{i=0}^nb(i;n,p)-\sum_{i=k}^nb(i;n,p) && \text{(by equation (C.40))} \\ &= \sum_{i=0}^{k-1}b(i;n,p). \end{align*}

Exercise C.4-10

Define the sample space $S$ to be consisted of outcomes as sequences of $n$ Bernoulli trials, where the ith trial has a probability $p_i$ of success. A set $A$ in this exercise corresponds to one such sequence. Define $X_i$ to be an indicator random variable associated with the ith trial in $A$ ( $X_i=1$ when it is a success, otherwise 0), so the total number of successes in $A$ can be represented with the random variable $X=\sum_{i=1}^n X_i$ . We devise an experiment involving trials in $A$ to obtain the Bernoulli trials in $A'$ following the hint from the book. The point is to show that if each trial in one sequence has a probability of success at least as high as those in another, the total successes will be at least equal to the other sequence.

The previously mentioned hint from the book refers to coupling, a powerful proof technique in probability theory. The fundamental idea is to introduce coupled random variables that could be used to compare the original random variables having different probability distributions. In our case, we cannot directly relate sets $A$ and $A'$ , as they contain Bernoulli trials with different distributions. However, we can derive a new set of indicator random variables $X'_i$ , conditionally coupled on $X_i$ , such that they faithfully reflect the probability distributions of random trials in $A'$ . Therefore, the total number of successes in $A'$ can be represented with the random variable $X'=\sum_{i=1}^n X'_i$ and we can compare $X$ and $X'$ using the same sample space.

Define

X'_i =\begin{cases} 1 & \text{if } X_i=1, \\ Bernoulli(r_i) & \text{if } X_i=0, \end{cases}

where $r_i=\frac{p'_i-p_i}{1-p_i}$ . Notice that $p_i<1$ when $X_i=0$ , otherwise $X_i$ would be 1. For the marginal distribution of $X'_i$ we have

\begin{align*} \Pr\{X'_i=1\} &= \Pr\{(X'_i \cap X_i) \cup (X'_i \cap \overline{X_i})\} \\ &= \Pr\{X'_i\cap X_i\}+\Pr\{X'_i\cap\overline{X_i}\} \\ &= p_i+(1-p_i)\cdot r_i \\ &=p'_i \quad \checkmark \end{align*}

By construction, for all $i$ we have $X'_i \ge X_i$ , hence $X' \ge X$ . Furthermore, the coupling preserves the independence of the trials within each sequence because the construction for each $i$ depends only on $X_i$ (and independent randomness for the conditional Bernoulli when $X_i=0$ ).

According the Exercise C.3-7 we have $\forall s \in S, (X'(s) \ge X(s) \implies \Pr\{X' \ge k\} \ge \Pr\{X \ge k\})$ for $0 \le k \le n$ . This concludes the proof of the statement from the book.

Exercise C.5-1

We expect to have a greater chance to obtain exactly $n$ heads in $2n$ flips of a fair coin, since, on average, we would expect equal number of heads and tails. We confirm this by calculating the probabilities. Define $X_m$ to be a random variable denoting the number of heads in $m > 0$ flips of a fair coin. Apparently, it has a binomial distribution $b(k;m,1/2)$ , thus for $n>1$ we have

\begin{align*} \frac{\Pr\{X_{2n}=n\}}{\Pr\{X_n=n\}} &= \frac{\binom{2n}{n}\left(\frac{1}{2}\right)^{2n}}{\binom{n}{n}\left(\frac{1}{2}\right)^n} \\[1mm] &= \frac{\binom{2n}{n}}{2^n} \\[1mm] &= \frac{2^n}{\sqrt{\pi n}}\,(1+O(1/n)) && \text{(by equation (C.11))} \\ &> 1 \quad \checkmark \end{align*}

As a side note, when $n=1$ then the two probabilities are equal.

Exercise C.5-2

We prove Corollaries (C.6) and (C.7) following the advice from the book; by inverting the roles of successes and failures in equations for the left tail of a binomial distribution.

Proof of Corollary C.6

When $np<k<n$ then $0<n-k<nq$ , hence we have

\begin{align*} \Pr\{X>k\}&=\Pr\{\overline{X}<n-k\} \\ &< \frac{(n-k)p}{nq-(n-k)}b(n-k;n,q) && \text{(by "inverted" Theorem C.4)} \\ &= \frac{(n-k)p}{n(1-p)-n+k}b(k;n,p) && \text{(by Exercise C.4-4)} \\ &= \frac{(n-k)p}{k-np}b(k;n,p) \end{align*}

Proof of Corollary C.7

When $(np+n)/2<k<n$ then $0<n-k<nq/2$ , so we get

\begin{align*} \frac{(n-k)p}{nq-(n-k)} &< \frac{nqp/2}{nq-nq/2} && \text{(by "inverted" Corollary C.5)} \\ &= \frac{nqp/2}{nq/2} \\ &\le 1 && \text{(} p\le 1\text{)} \\ &\therefore \Pr\{X>k\}<b(k;n,p) \end{align*}

Thus we have

\begin{align*} \frac{\Pr\{X>k\}}{\Pr\{X>k-1\}} &= \frac{\displaystyle\sum_{i=k+1}^n b(i;n,p)}{\displaystyle\sum_{i=k}^n b(i;n,p)} \\ &= \frac{\displaystyle\sum_{i=k+1}^n b(i;n,p)}{b(k;n,p)+\displaystyle\sum_{i=k+1}^n b(i;n,p)} \\ &< 1/2 \end{align*}

since $\sum_{i=k+1}^n b(i;n,p) < b(k;n,p)$ .

Exercise C.5-3

We see that $p=a/(a+1)$ and $q=1-p=1/(a+1)$ , so $a=p/q$ . Notice also that $0<k<np$ . Define $X$ to be a random variable with a binomial distribution $b(k;n,p)$ . We have

\begin{align*} \sum_{i=0}^{k-1}\binom{n}{i}a^i &= \sum_{i=0}^{k-1}\binom{n}{i}\left(\frac{p}{q}\right)^i \\[1mm] &= \frac{\displaystyle\sum_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i}}{q^n} \\[1mm] &= \frac{\displaystyle\sum_{i=0}^{k-1}b(i;n,p)}{q^n} \\[3mm] &= \frac{\Pr\{X<k\}}{q^n} \end{align*}

All conditions for applying Theorem C.4 are satisfied, thus we get

\begin{align*} \sum_{i=0}^{k-1}\binom{n}{i}a^i &< \frac{kq}{q^n(np-k)}\,b(k;n,p) \\ &= \frac{k/(a+1)}{(1/(a+1))^n(na/(a+1)-k)}\,b(k;n,p) \\[1mm] &= (a+1)^n\,\frac{k}{na-k(a+1)}\,b(k;n,a/(a+1)) \end{align*}

Exercise C.5-4

\begin{align*} \sum_{i=0}^{k-1}p^iq^{n-i} &\le \sum_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i} && \Bigg(\binom{n}{i} \ge 1 \text{ for all } 0 \le i \le n \Bigg)\\ &= \sum_{i=0}^{k-1}b(i;n,p) \\ &< \frac{kq}{np-k}\,b(k;n,p) && \text{(by Theorem C.4)} \\ &\le \frac{kq}{np-k}\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k} && \text{(by Lemma C.1)} \end{align*}

Exercise C.5-5

We must fit into the form dictated by Theorem C.8, so let us work backwards, starting from the known fact $r>n-\mu=n-\mathbb{E}{[X]}=\mathbb{E}{[n-X]}=\mathbb{E}{[X']}=\mu'$ . Thus, we have

\begin{align*} \Pr\{\mu-X\ge r\} &= \Pr\{(n-\mu')-X\ge r\} \\ &= \Pr\{X'-\mu'\ge r\} \\ &\le \left(\frac{\mu' e}{r}\right)^r \\ &= \left(\frac{(n-\mu)e}{r}\right)^r \end{align*}

For the second part just replace $\mu=\mathbb{E}{[X]}=np$ in the formula above and simplify $n-np=nq$ .

Exercise C.5-6

We follow the hint from the book and show that $p_ie^{\alpha q_i}+q_ie^{-\alpha p_i} \le e^{\alpha^2\!/2}$ for all nonnegative reals $p_i$ , $q_i$ and $\alpha$ such that $p_i+q_i=1$ . The left hand side of the inequality is the moment generating function of a real-valued random variable $Y_i=X_i-p_i$ , where $X_i$ is an indicator random variable taking value of 1 with probability $p_i$ , as described in the book. Apparently, $\mathbb{E}{[Y_i]}=0$ and $-p_i \le Y_i \le q_i$ , thus it is bounded. Therefore, we can apply the Hoeffding's lemma (see the attached PDF, which was downloaded on August, 2025 from here), which says

\begin{align*} \mathbb{E}{[e^{\alpha Y_i}]} &= p_ie^{\alpha q_i}+q_ie^{-\alpha p_i} \\ &\le e^{\alpha^2(q_i-(-p_i))^2/8} \\ &=e^{\alpha^2(q_i+p_i)^2/8} \\ &=e^{\alpha^2/8} \\ &\le e^{\alpha^2/2} \quad \checkmark \end{align*}

85KB

hoeffding.pdf

PDF

Open

We follow the outline of the proof of Theorem C.8, using the previous inequality instead of (C.49). Thus, we have

\begin{align*} \mathbb{E}{[e^{\alpha(X-\mu)}]} &= \prod_{i=1}^n\mathbb{E}{[e^{\alpha(X_i-p_i)}]} \\ &= \prod_{i=1}^n(p_ie^{\alpha q_i}+q_ie^{-\alpha p_i}) \\ &\le \prod_{i=1}^ne^{\alpha^2\!/2} \\ &= \exp(\alpha^2n/2) \end{align*}

It follows that

\begin{align*} \Pr\{X-\mu\ge r\} &= \Pr\{e^{\alpha(X-\mu)}\ge e^{\alpha r}\} && \text{(by equation (C.47))} \\ &\le \mathbb{E}{[e^{\alpha(X-\mu)}]}e^{-\alpha r} && \text{(by inequality (C.48))} \\ &\le \exp(\alpha^2n/2-\alpha r) \end{align*}

We have a standard quadratic polynomial in terms of $\alpha$ that achieves its minimum value at $\alpha=r/n$ . We finally have

\begin{align*} \Pr\{X-\mu\ge r\} &\le \exp\left((r/n)^2n/2-(r/n)r\right) \\ &= e^{-r^2\!/{2n}} \end{align*}

Exercise C.5-7

The function $f(x)=e^{g(x)}$ is monotonically increasing in terms of $g(x)$ , so minimizing $g(x)$ also minimizes $f(x)$ . Therefore, we need to find the first and second derivates of the the exponent in the right-hand side of inequality (C.51); we have

\begin{align*} f'(\alpha) &= \mu e^\alpha-r=0 \implies \alpha_{min} = \ln (r/\mu) \\ f''(\alpha_{min}) &= \mu e^{\alpha_{min}} > 0 \therefore \alpha_{min} \text{ is minimum, assuming } \mu > 0 \end{align*}

Problem C-1

a.

See the solution here. The site contains lots of interesting puzzles worthwhile to study.

b.

The first three columns constitute the components of an outcome.

Initial Pick

Has Offer?

Switch?

Winning?

Probability

Right

Yes

$p_{right}p_{switch}/3$

Right

Yes

$p_{right}(1-p_{switch})/3$

Wrong

Yes

$2p_{wrong}p_{switch}/3$

Wrong

Yes

$2p_{wrong}(1-p_{switch})/3$

Right

Yes

$(1-p_{right})/3$

Wrong

$2(1-p_{wrong})/3$

c.

Just sum up the probabilities of the winning outcomes.

d.

He should minimize the expression from part (c). Therefore, his best choices are $p_{right}=1$ and $p_{wrong}=0$ .

e.

When $p_{switch}=0$ then Monty cannot influence the player's probability of winning a car, so all possible strategies are optimal for him.

f.

We should maximize the expression from part (c). Therefore, we have

p_{switch} = \begin{cases} 1 & \text{if } 2p_{wrong} \ge p_{right}, \\ 0 & \text{otherwise}. \end{cases}

g.

The combination $p_{right}=1$ and $p_{wrong}=0$ minimizes the probability of winning as a function of these variables. In this case, $p_{winning}=(1-p_{switch})/3$ , so setting $p_{switch}=0$ maximizes this probability.

h.

Let event $P$ designate that our initial pick is correct, event $O$ that Monty offers us an opportunity to switch, and event $W$ that we win a car. The total probability of getting an offer is

\Pr\{O\}=\Pr\{O|P\}\Pr\{P\}+\Pr\{O|\overline{P}\}\Pr\{\overline{P}\}=(p_{right}+2p_{wrong})/3

We know that Monty gives an offer, so $p_{right}>0 \lor p_{wrong} > 0$ , hence $p_{right}+2p_{wrong}>0$ .

Define event $S$ to stand for our decision to switch. The joint event of winning a car and having an opportunity to switch is

\begin{align*} W\cap O &= \left((P\cap\overline{S})\cup(\overline{P}\cap S)\right)\cap O \\ &= \left((P\cap O)\cap\overline{S}\right)\cup\left((\overline{P}\cap O)\cap S\right) \end{align*}

As stated in the problem description, we have no knowledge of Monty's possible motivations or strategies. Consequently, $S$ is independent of the other events. Thus, we have

\begin{align*} \Pr\{W\mid O\} &= \cfrac{\Pr\{W\cap O\}}{\Pr\{O\}} \\[1mm] &= \cfrac{\Pr\{(P\cap O)\cap\overline{S}\}+\Pr\{(\overline{P}\cap O)\cap S\}}{\Pr\{O\}} && \text{(mutually disjoint events)} \\[1mm] &= \cfrac{\Pr\{O\mid P\}\Pr\{P\}\Pr\{\overline{S}\}+\Pr\{O\mid\overline{P}\}\Pr\{\overline{P}\}\Pr\{S\}}{\Pr\{O\}} \\[1mm] &= \cfrac{p_{right}(1-p_{switch})/3+2p_{wrong}p_{switch}/3}{(p_{right}+2p_{wrong})/3} \\[1mm] &= \cfrac{p_{right}-p_{right}p_{switch}+2p_{wrong}p_{switch}}{p_{right}+2p_{wrong}} \end{align*}

i.

When $p_{switch}=1/2$ then the value of the expression (C.52) is 1/2. When $p_{switch}<1/2$ then Monty can choose $p_{right}=0$ and $p_{wrong}=1$ , otherwise, if $p_{switch}>1/2$ then he can select $p_{right}=1$ and $p_{wrong}=0$ . In both cases he can reduce the probability below 1/2.

j.

Not knowing anything about a problem's setup, like in this case pertaining to Monty's motivations and strategies around giving opportunities for switching, our best tactic in yes/no decision making situations is simply to flip a coin. Such a Bernoulli type random experiment has a great potential to annihilate sophisticated methods of other participants in reducing our chances of winning.

This problem has highlighted how subtle variations in assumptions may significantly influence the output. Furthermore, we have seen that probability theory is full of surprises compared to our everyday intuition. As a matter of fact, it has demonstrated that we are, in general, inapt in handling uncertainties and reasoning about random processes.

Problem C-2

a.

There are $b$ choices for each ball resulting in $b^n$ ways of putting them into bins.

b.

We follow the hint from the book. There are $(b+n-1)!$ possible arrangements (permutations) of $n$ distinct balls and $b-1$ indistinguishable sticks. Sticks essentially partition the balls into bins, possibly allowing empty bins, too. For example, |21||3 says that bins 1 and 3 are empty, bin 2 contains the balls 2,1 (in this specific order), and ball 3 is in the last bin. Since the sticks are indistinguishable, we must divide the total number of arrangements with $(b-1)!$ to get the result from the book.

c.

We again follow the hint from the book. Of the arrangements in part (b), $n!$ are repeated if the balls are made identical. Therefore, we have $\dfrac{(b+n-1)!}{n!(b-1)!}=\binom{b+n-1}{n}$ .

d.

Out of $b$ bins we select $n$ of them that will contain a ball. This is a classical example of counting combinations, so the answer is $\binom{b}{n}$ .

e.

We reuse the idea from part (b) with $b-1$ indistinguishable sticks. The only difference is that we now limit the placement (positions) of those sticks to be always between some number of balls. There are $n-1$ allowable positions (between any neighboring numbers) for sticks. From these we select $b-1$ of them, thus we get $\binom{n-1}{b-1}$ .

PreviousAppendix B NextAppendix D

Last updated 1 month ago