Appendix C

Exercise C.1-1

We omit empty strings, so k>=1k>=1. An nn-string has nk+1n-k+1 kk-substrings. By the rule of sum the total number of substrings is

k=1n(nk+1)=k=1nk(reindexing)=n(n+1)2(by equation (A.1))\begin{align*} \sum_{k=1}^n(n-k+1)&=\sum_{k=1}^n k && \text{(reindexing)} \\ &=\frac{n(n+1)}{2} && \text{(by equation (A.1))} \end{align*}

Exercise C.1-2

We apply the rule of product, where we have 2m2^m choices for output per input. In general, the total number of nn-input, mm-output boolean functions is

2m2m...2m2n times=(2m)2n\underbrace{2^m\cdot2^m\cdot...\cdot2^m}_{2^n \text{ times}}=\big(2^m\big)^{2^n}

There are 22n2^{2^n} nn-input, 1-output boolean functions.

Exercise C.1-3

If all seatings were unique, we would have n!n! arrangements. A rotation can be regarded as an equivalence relation on the set of seatings splitting them into equivalence classes (members are some number of rotations apart from each other). Each class contains nn seatings and we have n!n=(n1)!\frac{n!}{n}=(n-1)! such classes, which answers in how many ways can nn professors sit around a circular conference table.

Exercise C.1-4

There are 49 even and 50 odd numbers in the set {1,2,...,99}\{1,2,...,99\}. A sum of three distinct numbers is even when all of numbers are even or one of them is even whilst the others are odd. We can use equation (C.2) together with the rule of sum and product to calculate the answer.

(493)+(491)(502)=78449{49 \choose 3}+{49 \choose 1}{50 \choose 2}=78449

Just for fun, we can also execute the Python 3 script below to verify the result.

from itertools import combinations

print(sum(1 for a, b, c in combinations(range(1, 100), 3) 
            if (a + b + c) % 2 == 0))

Exercise C.1-5

(nk)=n!k!(nk)!=nk(n1)!(k1)!(nk)!=nk(n1k1){n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{k}\cdot\frac{(n-1)!}{(k-1)!(n-k)!}=\frac{n}{k}{n-1 \choose k-1}

Exercise C.1-6

(nk)=n!k!(nk)!=nnk(n1)!k!(nk1)!=nnk(n1k){n \choose k}=\frac{n!}{k!(n-k)!} = \frac{n}{n-k}\cdot\frac{(n-1)!}{k!(n-k-1)!}=\frac{n}{n-k}{n-1 \choose k}

Exercise C.1-7

To choose kk elements from nn elements, we can either decide to use a distinguished element and select the remaining k1k-1 items from n1n-1 items or skip it and select kk items from the remaining n1n-1 elements. This establishes the desired relationship.

Exercise C.1-8

You can find this table on Wikipedia.

Exercise C.1-9

We use equations (A.1) and (C.2) to prove the identity.

i=1ni=n(n+1)2=(n+1)!2!(n1)!=(n+12)\sum_{i=1}^n i=\frac{n(n+1)}{2}=\frac{(n+1)!}{2!(n-1)!}={n+1 \choose 2}

Exercise C.1-10

Assume n>0n>0, otherwise the maximum is for k=0k=0. We fix nn and treat (nk){n \choose k} as a function of kk. First. let us calculate the ratio between successive terms

(nk)(nk1)=nk(n1k1)nnk+1(n1k1)(by identities from Exercises C.1-5 and C.1-6)=nk+1k\begin{align*} \dfrac{\dbinom{n}{ k}}{\dbinom{n}{k-1}}&=\dfrac{\dfrac{n}{k}\dbinom{n-1 }{k-1}}{\dfrac{n}{n-k+1}\dbinom{n-1}{k-1}} && \text{(by identities from Exercises C.1-5 and C.1-6)} \\\\ &=\frac{n-k+1}{k} \end{align*}

As far as nk+1kn-k+1 \ge k, or equivalently kn+12k \le \frac{n+1}{2}, the function (nk){n \choose k} monotonically increases.

If nn is odd, then we have an even number of terms in the expansion for k=0,1,...,nk=0,1,...,n. (nk){n \choose k} is symmetric, so there are 2 candidates (one on each side); k=(n+1)/2=n/2k=(n+1)/2=\lceil n/2 \rceil or k=n/2k=\lfloor n/2 \rfloor.

If n is even, then there is only one maximum k=(n+1)/2=n/2=n/2=n/2k=\lfloor(n+1)/2\rfloor=\lfloor n/2 \rfloor=\lceil n/2 \rceil=n/2.

Exercise C.1-11

(j+k)!=j!(j+1)(j+2)...(j+k)j!(0+1)(0+2)...(0+k)=j!k!\begin{align*} (j+k)!&=j!(j+1)(j+2)\cdot...\cdot(j+k) \\ &\ge j!(0+1)(0+2)\cdot...\cdot (0+k) \\ &=j!k! \end{align*}

The inequality above turns into equality when j=0k=0j=0 \lor k=0.

(nj+k)=n!(j+k)!(njk)!n!j!k!(njk)!=n!j!(nj)!(nj)!k!(njk)!=(nj)(njk)\begin{align*} \binom{n}{j+k}&=\dfrac {n!}{(j+k)!(n-j-k)!} \\ &\le \dfrac {n!}{j!k!(n-j-k)!} \\ &=\dfrac {n!}{j!(n-j)!}\cdot \dfrac{(n-j)!}{k!(n-j-k)!} \\ &=\binom {n}{j}\binom{n-j}{k} \end{align*}

The inequality expresses the number of ways of choosing j+kj+k items out of nn by rule of product. Each selection of jj items out nn is multiplied by the number of ways of choosing additional kk items out the remaining njn-j. Nonetheless, this approach is erroneous as it counts the same combination multiple times.

For example, (52+1)=10<(52)(31)=30\binom{5}{2+1}=10 < \binom{5}{2}\binom{3}{1}=30. Suppose we have chosen {a,b}\{a,b\} from {a,b,c,d,e}\{a,b,c,d,e\}. Apparently, we have 3 options for the additional item. If we pick cc, then we get {a,b,c}\{a,b,c\}. But if we start with another initial selection, like {a,c}\{a,c\}, then we don't have anymore 3 choices for the next item. If we pick bb, then we will overshoot.

Exercise C.1-12

We prove the statement by induction on kk. The base case k=0k=0 is true, provided that for convenience we assume 00=10^0=1, as mentioned in the book.

For k>0k>0 we have

(nk)=nk(n1k1)(by Exercise C.1-5)=nknk+1n(nk1)(by Exercise C.1-6)nnk(k1)k1(nk+1)nk(by the inductive hypothesis)\begin{align*} \binom{n}{k} &= \frac{n}{k}\binom{n-1}{k-1} && \text{(by Exercise C.1-5)} \\ &= \frac{n}{k}\cdot\frac{n-k+1}{n}\binom{n}{k-1} && \text{(by Exercise C.1-6)} \\ &\le \frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} && \text{(by the inductive hypothesis)} \end{align*}

We prove that

nnk(k1)k1(nk+1)nknnkk(nk)nk\frac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}} \le \frac{n^n}{k^k(n-k)^{n-k}}

by examining the ratio

nnk(k1)k1(nk+1)nknnkk(nk)nk=kk1(nk)nk(k1)k1(nk+1)nk=(kk1)k1(nk+1nk)nk=(1+1k1)k1(1+1nk)nk\begin{align*} \cfrac{\dfrac{n^n}{k(k-1)^{k-1}(n-k+1)^{n-k}}}{\dfrac{n^n}{k^k(n-k)^{n-k}}} &= \cfrac{k^{k-1}(n-k)^{n-k}}{(k-1)^{k-1}(n-k+1)^{n-k}} \\ &= \frac{\left(\dfrac{k}{k-1}\right)^{k-1}}{\left(\dfrac{n-k+1}{n-k}\right)^{n-k}} \\ &= \frac{\left(1+\dfrac{1}{k-1}\right)^{k-1}}{\left(1+\dfrac{1}{n-k}\right)^{n-k}} \end{align*}

The sequence an=(1+1/n)na_n={(1+1/n)}^n is strictly increasing (see the proof here) and we need to find the threshold below which the ratio is less than or equal to one. This happens while k1nkk-1\le n-k, or equivalently for all integers kk such that k(n+1)/2k\le(n+1)/2. According to the principles of mathematical induction we can conclude that inequality (C.7) holds for all integers kk such that 0kn/2<(n+1)/20 \le k \le n/2 < (n+1)/2.

When n/2knn/2 \le k \le n then 0nkn/20 \le n-k \le n/2, so we can apply equation (C.3) after substituting kk with nkn-k in inequality (C.7). Therefore, we can conclude that inequality (C.7) holds for all integers kk such that 0kn0 \le k \le n.

Exercise C.1-13

This exercises is flawed, since (2nn)22nπn(1+O(1/n))\binom{2n}{n} \neq \cfrac {2^{2n}}{\sqrt{\pi n}}(1+O(1/n)). The reasons will become clear once we prove that (2nn)=22nπn(1+O(1/n))\binom{2n}{n}=\cfrac {2^{2n}}{\sqrt{\pi n}}(1+O'(1/n)) (see Problem 3-6 part (d)).

(2nn)=(2n)!(n!)2=4πn(2n/e)2neα2n2πn(n/e)2ne2αn(by equation (3.29))=22nπneα2n2αn\begin{align*} \binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\ &= \cfrac{\sqrt{4\pi n}\,(2n/e)^{2n}e^{\alpha_{2n}}}{2\pi n\,(n/e)^{2n}e^{2\alpha_n}} && \text{(by equation (3.29))}\\[1mm] &= \cfrac {2^{2n}}{\sqrt{\pi n}} \cdot e^{\alpha_{2n}-2\alpha_n} \end{align*}

Let x=α2n2αnx=\alpha_{2n}-2\alpha_n. As nn increases x0x \to 0, so ex1+xe^x \approx 1+x. We can derive equation (3.25) from equation (3.29) by leveraging the same approximation. Equation (3.29) is exact and as nn increases the adjustment factor ex1e^x \to 1_- instead of ex1+e^x \to 1^+ (you can experiment with small values of nn to gain intuition). xx is always negative, x=O(1/n)|x|=O(1/n), hence 1+x=1+O(1/n)1+x=1+O'(1/n), which concludes the proof.

How can things go wrong?

(2nn)=(2n)!(n!)2=4πn(2n/e)2n(1+Θ(1/n))2πn(n/e)2n(1+Θ(1/n))2(by equation (3.25))\begin{align*} \binom{2n}{n} &= \cfrac{(2n)!}{(n!)^2} \\ &= \cfrac{\sqrt{4\pi n}\,(2n/e)^{2n}(1+\Theta(1/n))}{2\pi n\,(n/e)^{2n}(1+\Theta(1/n))^2} && \text{(by equation (3.25))} \end{align*}

Θ(1/n)\Theta(1/n) stands for a whole family of functions bounded by g(n)=1/ng(n)=1/n, so we cannot reduce the numerator and denominator in a usual manner. On the other hand, we can upper bound the fraction by bounding the numerator from above and the denominator from below. Let cc and dd represent constants hidden in O(1/n)O(1/n) and Ω(1/n)\Omega(1/n) of the numerator and denominator, respectively.

1+Θ(1/n)(1+Θ(1/n))21+c/n(1+d/n)2<1+c/n1+d/n=n+cn+d=1+cdn+d<1+cdn\begin{align*} \cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2} &\le \cfrac{1+c/n}{(1+d/n)^2} \\[1mm] &< \cfrac{1+c/n}{1+d/n} \\[1mm] &= \cfrac{n+c}{n+d} \\ &= 1+\cfrac{c-d}{n+d} \\[1mm] &< 1+\cfrac{|c-d|}{n} \end{align*}

So far everything seems correct, thus we may be tempted to jump to the following conclusion

1+Θ(1/n)(1+Θ(1/n))2=1+O(1/n)      WRONG!\cfrac{1+\Theta(1/n)}{(1+\Theta(1/n))^2}= 1+O(1/n) \space\space\space\space\space\space\text{WRONG!}

Below is the citation from the book (see page 58) explaining how to treat asymptotic notations appearing on both sides of an equation:

No matter how the anonymous functions are chosen on the left of the equal sign, there is a way to choose the anonymous functions on the right of the equal sign to make the equation valid.

Pick f(n)=1/nf(n)=1/n as a replacement for Θ(1/n)\Theta(1/n) both in the numerator and denominator. Consequently, the left side becomes 1/(1+(1/n))<11/(1+(1/n))<1. There is no way to select an asymptotically nonnegative function on the ride side to make it less than 1.

The book talks about "proper" abuses of asymptotic notations, but as this example illustrates, it is easy to become a victim of such convenience.

Exercise C.1-14

We need to calculate the first and second derivative of H(λ)H(\lambda) using the base conversion rule for logarithms lgx=lnx/ln2\lg x=\ln x / \ln 2.

H(λ)=(lgλλ1λln2)+(lg(1λ)+(1λ)1(1λ)ln2)=lgλlge+lg(1λ)+lge=lg1λλH(λ)=λ(1λ)ln2λ(1λ)λ2=lgeλ(1λ)\begin{align*} H'(\lambda) &= \left(-\lg\lambda-\lambda\cdot\frac{1}{\lambda\ln2}\right)+\left(\lg(1-\lambda)+(1-\lambda)\cdot\frac{1}{(1-\lambda)\ln2}\right) \\[1mm] &= -\lg\lambda-\lg e+\lg(1-\lambda)+\lg e \\[1mm] &= \lg\frac{1-\lambda}{\lambda}\\\\[1mm] H''(\lambda) &= \frac{\lambda}{(1-\lambda)\ln2}\cdot\frac{-\lambda-(1-\lambda)}{\lambda^2} \\[1mm] &= -\frac{\lg e}{\lambda(1-\lambda)} \end{align*}

We have H(1/2)=0H(1/2)<0H'(1/2)=0 \land H''(1/2)<0, so H(λ)H(\lambda) achieves its maximum value at λ=1/2\lambda=1/2. H(1/2)=1H(1/2)=1.

Exercise C.1-15

The equation trivially holds when n=0n=0. We solve this problem for n>0n>0 by thinking a bit out-of-the-box, which also demonstrates a useful technique for solving seemingly complicated sums. First couple of preliminaries

k=0n(nk)k=k=1n(nk)kk=1n(nk)=2n1\begin{align*} \sum_{k=0}^n\binom{n}{k}k&=\sum_{k=1}^n\binom{n}{k}k \\ \sum_{k=1}^n\binom{n}{k}&=2^n-1 \end{align*}

Draw an n×nn \times n table enlisting terms of the sum diagonally, taking into account the multiplicative factor kk, as follows:

(n1)\binom{n}{1}

(n2)\binom{n}{2}

(n3)\binom{n}{3}

...

...

(nn)\binom{n}{n}

(n2)\binom{n}{2}

(n3)\binom{n}{3}

...

...

(nn)\binom{n}{n}

...

(n3)\binom{n}{3}

...

...

(nn)\binom{n}{n}

...

...

...

...

...

...

...

(n3)\binom{n}{3}

...

(nn)\binom{n}{n}

...

...

(n3)\binom{n}{3}

(n2)\binom{n}{2}

(nn)\binom{n}{n}

...

...

(n3)\binom{n}{3}

(n2)\binom{n}{2}

(n1)\binom{n}{1}

Each column sums to 2n12^n-1, so the total per the whole table is n(2n1)n(2^n-1). Observe that the sums of elements in the upper-left and bottom-right triangular sub-matrixes (they both include the diagonal entries) are the same. We have

2SSD=n(2n1)=n(2n1)+n(SD=n(nn)=n)=n2nS=n2n1\begin{align*} 2S_\triangle-S_D&=n(2^n-1) \\ &=n(2^n-1)+n && \text{(} S_D=n\binom{n}{n}=n \text{)} \\ &=n2^n \\ &\therefore S_\triangle=n2^{n-1} \end{align*}

We have shown that the equation (C.12) is true for any integer n0n \ge0.

Exercise C.1-16

We can easily verify the cases for n=0,1,2,3n=0,1,2,3 or k=0,1k=0,1, so assume n4k>1n \ge 4 \land k>1. We have

(nk)=n(n1)(n(k1))k!=nkk!(11n)(1k1n)nkk!(1k1n)k1>nkk!(1nn)k1(k1<n)>nkk!(11n)n(xn strictly decreases when 0<x<1)nk4k!(since (11/(n))(n)1/4 for all n4)\begin{align*} \binom{n}{k} &= \frac{n(n-1)\cdots(n-(k-1))}{k!} \\ &= \frac{n^k}{k!}\left(1-\frac{1}{n}\right)\cdots\left(1-\frac{k-1}{n}\right) \\ &\ge \frac{n^k}{k!}\left(1-\frac{k-1}{n}\right)^{k-1} \\ &> \frac{n^k}{k!}\left(1-\frac{\sqrt{n}}{n}\right)^{k-1} && \text{(} k-1<\sqrt{n} \text{)}\\ &> \frac{n^k}{k!}\left(1-\frac{1}{\sqrt{n}}\right)^{\sqrt{n}} && \text{(} x^n \text{ strictly decreases when } 0<x< 1 \text{)} \\ &\ge \frac{n^k}{4k!} && \text{(since } (1-1/\sqrt(n))^{\sqrt(n)} \ge 1/4\text{ for all } n \ge 4 \text{)} \end{align*}

Exercise C.2-1

Let event A be that Professor Rosencrantz obtains strictly more heads than Professor Guildenstern. RR_* and GG_* denote various events about results for Professors Rosencrantz and Guildenstern, respectively. We have

A=(RH1GT)(RH=2GH)Pr{A}=Pr{(RH1GT)(RH=2GH)}Pr{A}=Pr{RH1GT}+Pr{RH=2GH}(events are mutually exclusive)Pr{A}=Pr{RH1}Pr{GT}+Pr{RH=2}Pr{GH}(events are independent)Pr{A}=3412+1412=12\begin{align*} A&=(R_{H\ge1} \cap G_T) \cup (R_{H=2} \cap G_H) \\ \Pr\{A\}&=\Pr\{(R_{H\ge1} \cap G_T) \cup (R_{H=2} \cap G_H)\} \\ \Pr\{A\}&=\Pr\{R_{H\ge1} \cap G_T\}+\Pr\{R_{H=2} \cap G_H\} && \text{(events are mutually exclusive)} \\ \Pr\{A\}&=\Pr\{R_{H\ge1} \}Pr\{G_T\}+\Pr\{R_{H=2}\}\Pr\{G_H\} && \text{(events are independent)} \\ \Pr\{A\}&=\frac{3}{4}\cdot\frac{1}{2}+\frac{1}{4}\cdot\frac{1}{2}=\frac{1}{2} \end{align*}

Exercise C.2-2

Define a new sequence BB inductively as follows.

Bn={A1if n=1,Ani=1n1Biif n>1.B_n=\begin{dcases} A_1 &\text{if } n=1, \\ A_n-\bigcup_{i=1}^{n-1} B_i &\text{if } n>1. \end{dcases}

This new sequence has the following desired properties:

  • By definition, for each nn, BnAnB_n \sube A_n.

  • All members of the sequence BB are pairwise mutually exclusive. For any distinct indices 1i<jn1 \le i < j \le n it is true that Bj=Aj(B1BiBj1)B_j=A_j-(B_1 \cup \dots \cup B_i\cup\dots\cup B_{j-1}), hence BiBj=B_i \cap B_j=\empty.

  • For each nn, i=1nBi=i=1nAi\bigcup_{i=1}^nB_i=\bigcup_{i=1}^nA_i, which we prove by induction on nn. The base cases are n=1n=2n=1 \lor n=2 and both hold based upon how union of sets is specified. For example, AB=A(BA)A \cup B=A \cup (B-A), as duplicates are not allowed. For n>2n>2 we have

i=1nBi=i=1n1BiBn(by associativity rule)=i=1n1AiBn(by the inductive hypothesis)=i=1n1Ai(Ani=1n1Ai)(by the inductive hypothesis)=i=1nAi(by the base case)\begin{align*} \bigcup_{i=1}^nB_i &= \bigcup_{i=1}^{n-1}B_i \cup B_n && \text{(by associativity rule)} \\ &= \bigcup_{i=1}^{n-1}A_i \cup B_n && \text{(by the inductive hypothesis)} \\ &= \bigcup_{i=1}^{n-1}A_i \cup (A_n-\bigcup_{i=1}^{n-1}A_i) && \text{(by the inductive hypothesis)} \\ &= \bigcup_{i=1}^nA_i && \text{(by the base case)} \end{align*}

The last fact implies that i=1Bi=i=1Ai\bigcup_{i=1}^{\infty}B_i=\bigcup_{i=1}^{\infty}A_i. Using the generalized 3rd axiom of probability (see the book) we can prove Boole's inequality for any countably infinite sequence of events (simply replace \infty with nn for the finite case).

Pr{i=1Ai}=Pr{i=1Bi}=i=1Pr{Bi}(all events are pairwise disjoint)i=1Pr{Ai}(Pr{Bi}Pr{Ai} for all i)\begin{align*} \Pr\bigg\lbrace\bigcup_{i=1}^{\infty}A_i\bigg\rbrace&=\Pr\bigg\lbrace\bigcup_{i=1}^{\infty}B_i\bigg\rbrace \\ &=\sum_{i=1}^{\infty}\Pr\{B_i\} && \text{(all events are pairwise disjoint)} \\ &\le \sum_{i=1}^{\infty}\Pr\{A_i\} && \text{(} \Pr\{B_i\} \le \Pr\{A_i\} \text{ for all } i \text{)} \end{align*}

Exercise C.2-3

Let a 3-tuple (i,j,k)(i,j,k) represent numbers on the selected cards in order of their removal from the deck. For a fixed jj all triples having i<j<ki<j<k are in sorted (increasing) order; there are (j1)(10j)(j-1)(10-j) such triples. Let AjA_j denote an event that a triple for a given jj is in sorted order, and AA be an event that a triple is in sorted order. We have

A=j=29AjPr{A}=Pr{j=29Aj}=j=29Pr{Aj}(events Aj are mutually exclusive)=j=29(j1)(10j)(103)3!=120720=16(by equations (A.1), (A.4) and (C.2))\begin{align*} A&=\bigcup_{j=2}^9 A_j \\ \Pr\{A\}&=\Pr\Bigg\{\bigcup_{j=2}^9 A_j \Bigg\}\\ &=\sum_{j=2}^9 \Pr\{A_j\} && \text{(events }A_j \text{ are mutually exclusive)} \\ &=\sum_{j=2}^9 \frac{(j-1)(10-j)}{\binom{10}{3}3!} \\ &=\frac{120}{720}=\frac{1}{6} && \text{(by equations (A.1), (A.4) and (C.2))} \end{align*}

Exercise C.2-4

Pr{AB}+Pr{AB}=Pr{AB}Pr{B}+Pr{AB}Pr{B}(by equation (C.16))=Pr{(AB)(AB)}Pr{B}(events in the numerator are mutually disjoint)=Pr{B}Pr{B}=1\begin{align*} \Pr\{A\mid B\}+\Pr\{\overline{A}\mid B\} &= \cfrac{\Pr\{A\cap B\}}{\Pr\{B\}}+\cfrac{\Pr\{\overline{A}\cap B\}}{\Pr\{B\}} && \text{(by equation (C.16))}\\ &= \cfrac{\Pr\{(A\cap B)\cup(\overline{A}\cap B)\}}{\Pr\{B\}} && \text{(events in the numerator are mutually disjoint)} \\ &= \cfrac{\Pr\{B\}}{\Pr\{B\}} \\ &= 1 \end{align*}

Exercise C.2-5

We prove the statement by induction on nn. The base case n=1n=1 trivially holds. For n>1n>1 we have

Pr{i=1nAi}=Pr{i=1n1AiAn}(by the associativity rule)=Pr{i=1n1Ai}Pr{Ani=1n1Ai}(by equation (C.16))=Pr{A1}Pr{A2A1}Pr{Ani=1n1Ai}(by the inductive hypothesis)\begin{align*} \Pr\Bigg\{\bigcap_{i=1}^nA_i\Bigg\} &= \Pr\Bigg\{\bigcap_{i=1}^{n-1}A_i \cap A_n\Bigg\} && \text{(by the associativity rule)} \\ &= \Pr\Bigg\{\bigcap_{i=1}^{n-1}A_i\Bigg\}\Pr\Bigg\{A_n\biggm|\bigcap_{i=1}^{n-1}A_i\Bigg\} && \text{(by equation (C.16))} \\ &= \Pr\{A_1\}\Pr\{A_2\mid A_1\}\cdots\Pr\Bigg\{A_n\biggm|\bigcap_{i=1}^{n-1}A_i\Bigg\} && \text{(by the inductive hypothesis)} \end{align*}

According to the principles of mathematical induction we can conclude that the statement from the book is true for all n1n \ge1.

Exercise C.2-6

Create a 2D sample space S={sij:1i,jn}S=\{s_{ij}:1\le i,j \le n\} with the uniform probability distribution on it. Consequently, Pr{sij}=1S=1n2\Pr\{s_{ij}\}=\cfrac{1}{|S|}=\cfrac{1}{n^2}. The idea is to specify events AiA_i in such a way that Ai=nAiAj=1AiAjAk=0|A_i|=n \land |A_i\cap A_j|=1 \land |A_i \cap A_j \cap A_k|=0 for any indices 1i<j<kn1 \le i<j<k \le n. Notice that if the probability of any 3-subset of events is zero, then it will be zero for any subset of k>3k >3 of them.

An event AmA_m should only be consisted of outcomes that contain the index mm. This ensures AiAj={sij}A_i \cap A_j=\{s_{ij}\}, thus no third event could be simultaneously satisfied. It turns out that Ai={sxi:1xi}{six:i<xn}A_i=\{s_{xi}:1 \le x \le i\} \cup \{s_{ix}:i<x\le n\} does the job. We have

Pr{Ai}=sAiPr{s}=nn2=1nPr{AiAj}=Pr{sij}=1n2=Pr{Ai}Pr{Aj}Pr{AiAjAk}=0Pr{Ai}Pr{Aj}Pr{Ak}\begin{align*} \Pr\{A_i\}&=\sum_{s \in A_i} \Pr\{s\}=\frac{n}{n^2}=\frac{1}{n} \\ \Pr\{A_i \cap A_j\}&=\Pr\{s_{ij}\}=\frac{1}{n^2}=\Pr\{A_i\}\Pr\{A_j\} \\ \Pr\{A_i \cap A_j \cap A_k\}&=0\neq\Pr\{A_i\}\Pr\{A_j\}\Pr\{A_k\} \end{align*}

The last line implies that

Pr{i=1k3Ai}=0i=1k3Pr{Ai}=1nk\Pr\Bigg\{\bigcap_{i=1}^{k\ge3}A_i\Bigg\}=0\neq \prod_{i=1}^{k\ge3} \Pr\{A_i\} =\frac{1}{n^k}

Exercise C.2-7

Suppose we have two light bulbs—one is operational, and the other is unreliable and works in 50% of the cases. We randomly pick one of the light bulbs and switch it on/off twice in succession. Let event A denote a successful first attempt to switch a light bulb on/off, and event B a successful second attempt. Let C be the event that the malfunctioning light bulb was picked. We have

Pr{A}=Pr{B}=Pr{BC}Pr{C}+Pr{BC}Pr{C}=(1)(1/2)+(1/2)(1/2)=3/4Pr{C}=1/2\begin{align*} \Pr\{A\} &= \Pr\{B\} \\ &= \Pr\{B\mid \overline{C}\}\Pr\{\overline{C}\} + \Pr\{B\mid C\}\Pr\{C\}\\ &= (1)\cdot(1/2)+(1/2)\cdot (1/2) \\ &= 3/4 \\ \Pr\{C\} &= 1/2 \end{align*}

The events AA and BB are not independent, since Pr{AB}=(1)(1/2)+(1/2)(1/4)=5/8\Pr\{A\cap B\} = (1)\cdot(1/2)+(1/2)\cdot (1/4)=5/8, while Pr{A}Pr{B}=9/16\Pr\{A\}\Pr\{B\}=9/16. But events AA and BB are conditionally independent given CC.

Pr{ABC}=1/4=(1/2)(1/2)=Pr{AC}Pr{BC}\begin{align*} \Pr\{A\cap B\mid C\} &= 1/4 \\ &= (1/2)\cdot(1/2) \\ &= \Pr\{A\mid C\}\Pr\{B\mid C\} \end{align*}

Exercise C.2-8

Let JJ, TT and CC be the events that Jeff, Tim and Carmine will pass the course, respectively. Without any additional information they have the same probability of 1/3 (a.k.a. prior). Let GRG_R be the event that Professor Gore will respond anything more specific about exam results. Let GCG_C be the event that Professor Gore told Carmine that Jeff is the one who will fail. Observe that this event contains the notion of whom the above information is revealed.

Assumptions

The choice of parameterization highlights a critical aspect of conditional probability problems: the necessity of explicitly defining the decision rules of the agent providing information (here, Professor Gore). The problem does not specify his behavior when both Jeff and Tim fail nor whether he is willing to tell anything more than what he has announced to everybody. Such presumptions affect the probability calculation. For example, Carmine's argument about symmetry, by his claim that Professor Gore "won’t be revealing any information," is only an assumption. We cannot enforce that symmetry on behavior of another agent.

Calculation

We have several conditional probabilities regarding what will Professor Gore announce to Carmine based upon outcomes.

Pr{GRGCJ}=0Pr{GRGCT}=pPr{GRGCC}=q\begin{align} \Pr\{G_R \cap G_C\mid J\}&=0 \\ \Pr\{G_R \cap G_C\mid T\}&=p \\ \Pr\{G_R \cap G_C\mid C\}&=q \end{align}
  • ('1) is zero because if Jeff is supposed to pass the course, then Professor Gore will not lie Carmine (a fair assumption).

  • (2) It is definitely Jeff who should be selected by Professor Gore, but he may decide not to tell this to Carmine with probability pp.

  • (3) Professor Gore will pick Jeff with probability qq.

We want to know what is Carmine's probability of passing the course now that he has found out that Jeff will fail. We apply equation (C.20) to get

Pr{CGRGC}=Pr{C}Pr{GRGCC}Pr{GRGC}=Pr{C}Pr{GRGCC}Pr{J}Pr{GRGCJ}+Pr{T}Pr{GRGCT}+Pr{C}Pr{GRGCC}=(1/3)q(1/3)0+(1/3)p+(1/3)q=qp+q\begin{align*} \Pr\{C\mid G_R \cap G_C\} &= \frac{\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}}{\Pr\{G_R \cap G_C\}} \\ &= \frac{\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}}{\Pr\{J\}\Pr\{G_R \cap G_C\mid J\}+\Pr\{T\}\Pr\{G_R \cap G_C\mid T\}+\Pr\{C\}\Pr\{G_R \cap G_C\mid C\}} \\[2mm] &= \frac{(1/3)\cdot q}{(1/3)\cdot0+(1/3)\cdot p+(1/3)\cdot q} \\[1mm] &= \frac{q}{p+q} \end{align*}

Certainly p+q>0p+q>0, since we already have an evidence that he has pointed on Jeff once.

Results

The tabs below showcase different results due to variations in the initial conditions. In both cases we fix p=1p=1.

Here q=1/2q=1/2, so Carmine's chance of passing is still 1/3. In the context of similar logic puzzles (e.g., Monty Hall), the intended answer is 1/3, but strictly speaking, it depends on the unknown qq (see Problem C-1).

Exercise C.3-1

Define the random variable XX to be the sum of the two values showing on the dice. We have

E[X]=x=212xPr{X=x}(by equation (C.23))=(1/36)(21+32+43+54+65+76+85+94+103+112+121)=252/36=7\begin{align*} \mathbb{E}{[X]} &= \sum_{x=2}^{12} x\Pr\{X=x\} \quad \text{(by equation (C.23))}\\ &= (1/36)\cdot(2\cdot1+3\cdot2+4\cdot3+5\cdot4+6\cdot5+7\cdot6+8\cdot5+9\cdot4+10\cdot3+11\cdot2+12\cdot1) \\ &= 252/36 \\ &=7 \end{align*}

Define the random variable XX to be the maximum of the two values showing on the dice. We have

E[X]=x=16xPr{X=x}(by equation (C.23))=(1/36)(11+23+35+47+59+611)=161/364.47\begin{align*} \mathbb{E}{[X]} &= \sum_{x=1}^6 x\Pr\{X=x\} \quad \text{(by equation (C.23))}\\ &= (1/36)\cdot(1\cdot1+2\cdot3+3\cdot5+4\cdot7+5\cdot9+6\cdot11) \\ &= 161/36 \\ &\approx 4.47 \end{align*}

Exercise C.3-2

Define the random variable XX to be the index of the maximum/minimum element in the array A[1:n]A[1:n]. In a randomly ordered array, containing distinct elements, the maximum/minimum element can be at any index with equal probability. In other words, Pr{X=i}=1/n\Pr\{X=i\}=1/n for any valid index ii. We have

E[X]=i=1niPr{X=i}=1ni=1ni=1nn(n+1)2=n+12\begin{align*} \mathbb{E}{[X]} &= \sum_{i=1}^ni\Pr\{X=i\} \\ &= \frac{1}{n}\sum_{i=1}^ni \\[1mm] &= \frac{1}{n}\cdot\frac{n(n+1)}{2} \\[1mm] &= \frac{n+1}{2} \end{align*}

Exercise C.3-3

Define the random variable XX to be the number of times the player's number appears on the three dice. The probability that it will appear on some dice is 1/6. Hence, it is 5/6 that it will not. We assume that dice rolls are independent, meaning that we can multiply together individual probabilities to calculate the joint probability for all 3 dice.

Define the random variable YY to be the player's gain (in dollars) from playing the carnival game once. Apparently, Y=f(X)Y=f(X) where the function ff maps the number of occurrences of the player's number to monetary value (loss/gain). We have

E[Y]=(1)Pr{X=0}+1Pr{X=1}+2Pr{X=2}+3Pr{X=3}=(1/63)((1)(30)125+1(31)25+2(32)5+3(33)1)=17/2160.08\begin{align*} \mathbb{E}{[Y]} &= (-1)\cdot\Pr\{X=0\}+1\cdot\Pr\{X=1\}+2\cdot\Pr\{X=2\}+3\cdot\Pr\{X=3\} \\ &= (1/6^3)\cdot((-1)\cdot\binom{3}{0}\cdot125+1\cdot\binom{3}{1}\cdot25+2\cdot\binom{3}{2}\cdot5+3\cdot\binom{3}{3}\cdot1) \\ &= -17/216 \\ &\approx -0.08 \end{align*}

The player will lose about 8 cents on average per game. As a side note, the random variable XX follows the binomial distribution with p=1/6p=1/6.

Exercise C.3-4

For any reals xx and yy it holds that min(x,y)+max(x,y)=x+y\min(x,y)+\max(x,y)=x+y. Therefore, we have

min{X,Y}+max{X,Y}=X+YE[min{X,Y}+max{X,Y}]=E[X+Y]E[min{X,Y}]+E[max{X,Y}]=E[X]+E[Y](by linearity of expectation)E[max{X,Y}]E[X]+E[Y](E[min{X,Y}]0)\begin{align*} \min⁡{\{X,Y\}}+\max⁡{\{X,Y\}}&=X+Y \\ \mathbb{E}{[\min⁡{\{X,Y\}}+\max⁡{\{X,Y\}}]} &= \mathbb{E}{[X+Y]} \\ \mathbb{E}{[\min⁡{\{X,Y\}}]}+\mathbb{E}{[\max⁡{\{X,Y\}}]}&=\mathbb{E}{[X]}+\mathbb{E}{[Y]} && \text{(by linearity of expectation)}\\ \mathbb{E}{[\max⁡{\{X,Y\}}]} &\le \mathbb{E}{[X]}+\mathbb{E}{[Y]} && \text{(} \mathbb{E}{[\min⁡{\{X,Y\}}]} \ge 0 \text{)} \end{align*}

Exercise C.3-5

Suppose f(xi)=fXf(x_i)=f_X for all i=1,2,,pi=1,2,\dots,p and g(yj)=gYg(y_j)=g_Y for all j=1,2,,qj=1,2,\dots,q, where xix_i and yjy_j are values that random variables XX and YY can take, respectively. We have

Pr{f(X)=fX and g(Y)=gY}=Pr{(i=1pX=xi) and (j=1qY=yj)}=Pr{1ip,1jq(X=xi and Y=yj)}=i=1pj=1qPr{X=xi and Y=yj}(mutually disjoint events)=i=1pj=1qPr{X=xi}Pr{Y=yj}(independence of X and Y)=(i=1pPr{X=xi})(j=1qPr{Y=yj})=Pr{f(X)=fX}Pr{g(Y)=gY}\begin{align*} \Pr\{f(X)=f_X \text{ and } g(Y)=g_Y\} &= \Pr\Bigg\{\Bigg(\bigcup_{i=1}^p X=x_i\Bigg) \text{ and } \Bigg(\bigcup_{j=1}^q Y=y_j\Bigg)\Bigg\} \\ &= \Pr\Bigg\{\bigcup_{\substack{1 \le i \le p, \\ 1 \le j \le q}} (X=x_i \text{ and } Y=y_j)\Bigg\} \\ &= \sum_{i=1}^p {\sum_{j=1}^q {\Pr\{X=x_i \text{ and } Y=y_j\}}} && \text{(mutually disjoint events)} \\ &= \sum_{i=1}^p {\sum_{j=1}^q {\Pr\{X=x_i\} \Pr\{Y=y_j\}}} && \text{(independence of X and Y)} \\ &= \Bigg(\sum_{i=1}^p {\Pr\{X=x_i\}}\Bigg)\Bigg(\sum_{j=1}^q {\Pr\{Y=y_j\}}\Bigg) \\ &= \Pr\{f(X)=f_X\}\Pr\{g(Y)=g_Y\} \end{align*}

We have not assumed anything about fXf_X and gYg_Y, so our conclusion about independence of f(X)f(X) and g(Y)g(Y) generalizes to all cases.

Exercise C.3-6

Wikipedia has the proof here.

Exercise C.3-7

We fix tt and define two sets A={sS:X(s)t}A=\{s\in S:X(s)\ge t\} and A={sS:X(s)t}A'=\{s\in S:X'(s)\ge t\}. For all sSs \in S it is true that sA    sAs \in A' \implies s \in A (because X(s)X(s)X(s) \ge X'(s)), thus AAA' \subseteq A. Using the definition of a random variable we get

Pr{Xt}=sAPr{s}=sAPr{s}+sAAPr{s}sAPr{s}=Pr{Xt}\begin{align*} \Pr\{X\ge t\} &= \sum_{s\in A}\Pr\{s\} \\ &= \sum_{s\in A'}\Pr\{s\}+\sum_{s\in A-A'}\Pr\{s\} \\ &\ge \sum_{s\in A'}\Pr\{s\} \\ &= \Pr\{X'\ge t\} \end{align*}

We have not assumed anything about tt, so our conclusion generalizes to all cases.

This exercise introduces the notion of stochastic dominance. In our case, we say that XX is stochastically larger than XX', denoted as XXX \succcurlyeq X'.

Exercise C.3-8

The short answer is E[X2]E2[X[\mathbb{E}{[X^2]} \ge \mathbb{E}^2{[X[}. We can prove this either by using Jensen's inequality (C.30), after showing that x2x^2 is a convex function (see below), or by looking at the equation (C.32) and recalling that the variance of a random variable is always nonnegative.

f(λx+(1λ)y)(λf(x)+(1λ)f(y))(λx+(1λ)y)2(λx2+(1λ)y2)0λx2+(1λ)y2λ2x22λ(1λ)xy(1λ)2y2=λ(1λ)x2+λ(1λ)y22λ(1λ)xy=x2+y22xy(assuming 0<λ<1)=(xy)2\begin{align*} f(\lambda x+(1-\lambda)y) &\le (\lambda f(x)+(1-\lambda)f(y)) \\ (\lambda x+(1-\lambda)y)^2 &\le (\lambda x^2+(1-\lambda)y^2) \\ 0 &\le \lambda x^2+(1-\lambda)y^2 - \lambda^2x^2-2\lambda(1-\lambda)xy-(1-\lambda)^2y^2\\ &= \lambda(1-\lambda)x^2+\lambda(1-\lambda)y^2-2\lambda(1-\lambda)xy \\ &= x^2 + y^2 -2xy && \text{(assuming } 0 < \lambda < 1 \text{)}\\ &= (x-y)^2 \quad \checkmark \end{align*}

When λ=0λ=1\lambda = 0 \lor \lambda = 1 then the inequality is trivially satisfied.

Exercise C.3-9

For an indicator random variable XX it is true that X2=XX^2=X. Therefore, we have

Var[X]=E[X2]E2[X](by equation (C.32))=E[X]E2[X]=E[X](1E[X])=E[X]E[1X](by linearity of expectation)\begin{align*} \mathbb{Var}{[X]} &= \mathbb{E}{[X^2]}-\mathbb{E}^2{[X]} && \text{(by equation (C.32))} \\ &= \mathbb{E}{[X]}-\mathbb{E}^2{[X]} \\ &= \mathbb{E}{[X]}(1-\mathbb{E}{[X]}) \\ &= \mathbb{E}{[X]}\mathbb{E}{[1-X]} && \text{(by linearity of expectation)} \end{align*}

Exercise C.3-10

Var[aX]=E[a2X2]E2[aX]=a2E[X2]a2E2[X](by equation (C.25))=a2(E[X2]E2[X])=a2Var[X]\begin{align*} \mathbb{Var}{[aX]} &= \mathbb{E}{[a^2X^2]}-\mathbb{E}^2{[aX]} \\ &= a^2\mathbb{E}{[X^2]}-a^2\mathbb{E}^2{[X]} && \text{(by equation (C.25))} \\ &= a^2\left(\mathbb{E}{[X^2]}-\mathbb{E}^2{[X]}\right) \\ &= a^2\mathbb{Var}{[X]} \end{align*}

Exercise C.4-1

k=1Pr{X=k}=k=1qk1p=pk=0qk=p11q(by equation (A.7))=1(p=1q)\begin{align*} \sum_{k=1}^\infty\Pr\{X=k\} &= \sum_{k=1}^\infty q^{k-1}p \\ &= p\sum_{k=0}^\infty q^k \\ &=p \cdot\frac{1}{1-q} && \text{(by equation (A.7))} \\[1mm] &= 1 \quad \checkmark && \text{(} p=1-q \text{)} \end{align*}

Exercise C.4-2

Define the random variable XX to be the number of trials needed to obtain a success. The probability of a success is p=b(3;6,1/2)=(63)(1/2)3(1/2)3=20/64=5/16p=b(3;6,1/2)=\binom{6}{3}(1/2)^3(1/2)^3=20/64=5/16. According to equation (C.36) E[X]=1/p=3.2\mathbb{E}{[X]}=1/p=3.2. Therefore, on average we need to flip 3\approx 3 times six fair coins before obtaining three heads and three tails.

Exercise C.4-3

E[X2]=k=1k2Pr{X=k}=k=1k2qk1p=pqk=1k2qk=pqq(1+q)(1q)3(by Exercise A.1-6)=(1+q)/p2\begin{align*} \mathbb{E}{[X^2]} &= \sum_{k=1}^\infty k^2\Pr\{X=k\} \\ &= \sum_{k=1}^\infty k^2q^{k-1}p \\ &= \frac{p}{q}\sum_{k=1}^\infty k^2q^k \\[1mm] &= \frac{p}{q}\cdot\frac{q(1+q)}{(1-q)^3} && \text{(by Exercise A.1-6)} \\[1mm] &= (1+q)/p^2 \end{align*}
Var[X]=E[X2]E2[X](by equation (C.36))=(1+q)/p21/p2=q/p2\begin{align*} \mathbb{Var}{[X]} &= \mathbb{E}{[X^2]}-\mathbb{E}^2{[X]} && \text{(by equation (C.36))}\\ &= (1+q)/p^2-1/p^2 \\ &= q/p^2 \end{align*}

Exercise C.4-4

b(k;n,p)=(nk)pkqnk=(nnk)qnkpk(by equation (C.3) and p=1q)=b(nk;n,q)\begin{align*} b(k;n,p) &= \binom{n}{k}p^kq^{n-k} \\ &= \binom{n}{n-k}q^{n-k}p^k && \text{(by equation (C.3) and } p=1-q\text{)} \\ &= b(n-k;n,q) \end{align*}

Exercise C.4-5

The binomial distribution b(k;n,p)b(k;n,p) attains a maximum at integer kk that lies in the range npqk(n+1)pnp-q \le k \le (n+1)p. As we are looking for an approximation, let k=npk=np and assume it is an integer. The remaining steps are purely mechanical. We have

b(np;n,p)=(nnp)pnp(1p)nnp=n!(np)!(nnp)!pnp(1p)nnp=n!(np)!(nq)!pnpqnq2πn(ne)n2πnp(npe)np2πnq(nqe)nqpnpqnq(using Stirling’s approximation (3.25))=(ne)n(enp)np(enq)nq2πnpqpnpqnq=1pnpqnq2πnpqpnpqnq=12πnpq\begin{align*} b(np;n,p) &= \binom{n}{np}p^{np}(1-p)^{n-np} \\ &= \frac{n!}{(np)!\,(n-np)!}\,p^{np}(1-p)^{n-np} \\[1mm] &= \frac{n!}{(np)!\,(nq)!}\,p^{np}q^{nq} \\ &\approx \frac{\sqrt{2\pi n}\left(\dfrac{n}{e}\right)^n}{\sqrt{2\pi np}\left(\dfrac{np}{e}\right)^{np}\sqrt{2\pi nq}\left(\dfrac{nq}{e}\right)^{nq}}\,p^{np}q^{nq} && \text{(using Stirling's approximation (3.25))}\\[6mm] &= \frac{\left(\dfrac{n}{e}\right)^n\left(\dfrac{e}{np}\right)^{np}\left(\dfrac{e}{nq}\right)^{nq}}{\sqrt{2\pi npq}}\,p^{np}q^{nq} \\[2mm] &= \cfrac{1}{p^{np}q^{nq}\sqrt{2\pi npq}}\,p^{np}q^{nq} \\ &=\cfrac{1}{\sqrt{2\pi npq}} \end{align*}

Exercise C.4-6

We need to find b(0;n,1/n)=(n0)(1/n)0(11/n)n=(11/n)nb(0;n,1/n)=\binom{n}{0}(1/n)^0(1-1/n)^n=(1-1/n)^n when nn \to \infin, so we have

limnb(0;n,1/n)=limn(11/n)n=1/e(by equation (3.16))\begin{align*} \lim_{n\to\infty}b(0;n,1/n) &= \lim_{n\to\infty}(1-1/n)^n \\ &= 1/e && \text{(by equation (3.16))} \end{align*}

We need to find b(1;n,1/n)=(n1)(1/n)(11/n)n1=(11/n)n/(11/n)b(1;n,1/n)=\binom{n}{1}(1/n)(1-1/n)^{n-1}=(1-1/n)^n/(1-1/n) when nn \to \infin, so we have

limnb(1;n,1/n)=limn(11/n)n11/n=1/e1(by equation (3.16))=1/e\begin{align*} \lim_{n\to\infty}b(1;n,1/n) &= \lim_{n\to\infty}\frac{(1-1/n)^n}{1-1/n} \\ &= \frac{1/e}{1} && \text{(by equation (3.16))} \\[1mm] &= 1/e \end{align*}

Exercise C.4-7

We prove the identity using the double counting proof technique. The idea is to use different counting approaches to express the same quantity, in this case the probability of professors getting the same number of heads, in two different ways. By equating those expressions we establish the desired relation.

Aggregating the coin flips

Consider each experiment as a sequence of 2n2n coin flips: the first nn by Professor Rosencrantz, the last nn by Professor Guildenstern. There are 22n=4n2^{2n}=4^n possible sequences. Let "head" be a success for Rosencrantz and "tail" a success for Guildenstern, as hinted in the book. If Professor Rosencrantz has xx heads (successes), then he will have nxn-x tails. Similarly, if Professor Guildenstern has this number of tails (successes), then he would have n(nx)=xn - (n-x) = x heads. Therefore, both get the same number of heads if there are exactly nn successes in total, which can occur in (2nn)\binom{2n}{n} ways. Thus, the desired probability is (2nn)/4n\binom{2n}{n}/4^n.

Using random variables

Define RR and GG to be random variables that represent the number of heads and tails for Professors Rosencrantz and Guildenstern, respectively. These variables are independent and follow the binomial distribution. We have

k=0nPr{R=k  and  G=nk}=k=0nPr{R=k}Pr{G=nk}=k=0nb(k;n,1/2)b(nk;n,1/2)=k=0nb(k;n,1/2)2(by Exercise C.4-4 when p=q)=k=0n(nk)2(12)2k(12)2(nk)=k=0n(nk)24n\begin{align*} \sum_{k=0}^n\Pr\{R=k\;\text{and}\;G=n-k\} &= \sum_{k=0}^n\Pr\{R=k\}\Pr\{G=n-k\} \\ &= \sum_{k=0}^nb(k;n,1/2)\,b(n-k;n,1/2) \\ &= \sum_{k=0}^n b(k;n,1/2)^2 && \text {(by Exercise C.4-4 when } p=q\text {)}\\ &= \sum_{k=0}^n\binom{n}{k}^2\left(\frac{1}{2}\right)^{2k}\left(\frac{1}{2}\right)^{2(n-k)} \\ &= \frac{\sum_{k=0}^n\binom{n}{k}^2}{4^n} \end{align*}

Exercise C.4-8

As explained in the book, for k=λnk=\lambda n, where 0λ10 \le \lambda \le 1, we can rewrite equation (C.7) as

(nλn)2nH(λ)\binom{n}{\lambda n} \le 2^{nH(\lambda)}

Letting λ=k/n\lambda=k/n we obtain

b(k;n,1/2)=(nk)(12)k(12)nk=(n(k/n)n)(12)n2nH(k/n)2n=2nH(k/n)n\begin{align*} b(k;n,1/2) &= \binom{n}{k}\left(\frac{1}{2}\right)^k\left(\frac{1}{2}\right)^{n-k} \\ &= \binom{n}{(k/n)\,n}\left(\frac{1}{2}\right)^n \\ &\le \frac{2^{nH(k/n)}}{2^n} \\[1mm] &= 2^{nH(k/n)-n} \end{align*}

Exercise C.4-9

The solution depicted here is based on Exercise C.4-10, so you might want to do that first. Define XX' to be the random variable denoting the total number of successes in nn Bernoulli trials, where the ith trial has a probability ppip \ge p_i of success. According to Exercise C.4-10 for 0kn0 \le k \le n we have Pr{Xk}Pr{Xk}\Pr\{X'\ge k\} \ge \Pr\{X\ge k\}. Furthermore, XX' follows a binomial distribution b(k;n,p)b(k;n,p) and Pr{Xk}=i=knb(i;n,p)\Pr\{X'\ge k\}=\sum_{i=k}^nb(i;n,p).

Pr{X<k}=1Pr{Xk}1Pr{Xk}=i=0nb(i;n,p)i=knb(i;n,p)(by equation (C.40))=i=0k1b(i;n,p).\begin{align*} \Pr\{X<k\} &= 1-\Pr\{X\ge k\} \\ &\ge 1-\Pr\{X'\ge k\} \\ &= \sum_{i=0}^nb(i;n,p)-\sum_{i=k}^nb(i;n,p) && \text{(by equation (C.40))} \\ &= \sum_{i=0}^{k-1}b(i;n,p). \end{align*}

Exercise C.4-10

Define the sample space SS to be consisted of outcomes as sequences of nn Bernoulli trials, where the ith trial has a probability pip_i of success. A set AA in this exercise corresponds to one such sequence. Define XiX_i to be an indicator random variable associated with the ith trial in AA (Xi=1X_i=1 when it is a success, otherwise 0), so the total number of successes in AA can be represented with the random variable X=i=1nXiX=\sum_{i=1}^n X_i. We devise an experiment involving trials in AA to obtain the Bernoulli trials in AA' following the hint from the book. The point is to show that if each trial in one sequence has a probability of success at least as high as those in another, the total successes will be at least equal to the other sequence.

The previously mentioned hint from the book refers to coupling, a powerful proof technique in probability theory. The fundamental idea is to introduce coupled random variables that could be used to compare the original random variables having different probability distributions. In our case, we cannot directly relate sets AA and AA', as they contain Bernoulli trials with different distributions. However, we can derive a new set of indicator random variables XiX'_i, conditionally coupled on XiX_i, such that they faithfully reflect the probability distributions of random trials in AA'. Therefore, the total number of successes in AA' can be represented with the random variable X=i=1nXiX'=\sum_{i=1}^n X'_i and we can compare XX and XX' using the same sample space.

Define

Xi={1if Xi=1,Bernoulli(ri)if Xi=0,X'_i =\begin{cases} 1 & \text{if } X_i=1, \\ Bernoulli(r_i) & \text{if } X_i=0, \end{cases}

where ri=pipi1pir_i=\frac{p'_i-p_i}{1-p_i}. Notice that pi<1p_i<1 when Xi=0X_i=0, otherwise XiX_i would be 1. For the marginal distribution of XiX'_i we have

Pr{Xi=1}=Pr{(XiXi)(XiXi)}=Pr{XiXi}+Pr{XiXi}=pi+(1pi)ri=pi\begin{align*} \Pr\{X'_i=1\} &= \Pr\{(X'_i \cap X_i) \cup (X'_i \cap \overline{X_i})\} \\ &= \Pr\{X'_i\cap X_i\}+\Pr\{X'_i\cap\overline{X_i}\} \\ &= p_i+(1-p_i)\cdot r_i \\ &=p'_i \quad \checkmark \end{align*}

By construction, for all ii we have XiXiX'_i \ge X_i, hence XXX' \ge X. Furthermore, the coupling preserves the independence of the trials within each sequence because the construction for each ii depends only on XiX_i (and independent randomness for the conditional Bernoulli when Xi=0X_i=0).

According the Exercise C.3-7 we have sS,(X(s)X(s)    Pr{Xk}Pr{Xk})\forall s \in S, (X'(s) \ge X(s) \implies \Pr\{X' \ge k\} \ge \Pr\{X \ge k\}) for 0kn0 \le k \le n. This concludes the proof of the statement from the book.

Exercise C.5-1

We expect to have a greater chance to obtain exactly nn heads in 2n2n flips of a fair coin, since, on average, we would expect equal number of heads and tails. We confirm this by calculating the probabilities. Define XmX_m to be a random variable denoting the number of heads in m>0m > 0 flips of a fair coin. Apparently, it has a binomial distribution b(k;m,1/2)b(k;m,1/2), thus for n>1n>1 we have

Pr{X2n=n}Pr{Xn=n}=(2nn)(12)2n(nn)(12)n=(2nn)2n=2nπn(1+O(1/n))(by equation (C.11))>1\begin{align*} \frac{\Pr\{X_{2n}=n\}}{\Pr\{X_n=n\}} &= \frac{\binom{2n}{n}\left(\frac{1}{2}\right)^{2n}}{\binom{n}{n}\left(\frac{1}{2}\right)^n} \\[1mm] &= \frac{\binom{2n}{n}}{2^n} \\[1mm] &= \frac{2^n}{\sqrt{\pi n}}\,(1+O(1/n)) && \text{(by equation (C.11))} \\ &> 1 \quad \checkmark \end{align*}

As a side note, when n=1n=1 then the two probabilities are equal.

Exercise C.5-2

We prove Corollaries (C.6) and (C.7) following the advice from the book; by inverting the roles of successes and failures in equations for the left tail of a binomial distribution.

Proof of Corollary C.6

When np<k<nnp<k<n then 0<nk<nq0<n-k<nq, hence we have

Pr{X>k}=Pr{X<nk}<(nk)pnq(nk)b(nk;n,q)(by "inverted" Theorem C.4)=(nk)pn(1p)n+kb(k;n,p)(by Exercise C.4-4)=(nk)pknpb(k;n,p)\begin{align*} \Pr\{X>k\}&=\Pr\{\overline{X}<n-k\} \\ &< \frac{(n-k)p}{nq-(n-k)}b(n-k;n,q) && \text{(by "inverted" Theorem C.4)} \\ &= \frac{(n-k)p}{n(1-p)-n+k}b(k;n,p) && \text{(by Exercise C.4-4)} \\ &= \frac{(n-k)p}{k-np}b(k;n,p) \end{align*}

Proof of Corollary C.7

When (np+n)/2<k<n(np+n)/2<k<n then 0<nk<nq/20<n-k<nq/2, so we get

(nk)pnq(nk)<nqp/2nqnq/2(by "inverted" Corollary C.5)=nqp/2nq/21(p1)Pr{X>k}<b(k;n,p)\begin{align*} \frac{(n-k)p}{nq-(n-k)} &< \frac{nqp/2}{nq-nq/2} && \text{(by "inverted" Corollary C.5)} \\ &= \frac{nqp/2}{nq/2} \\ &\le 1 && \text{(} p\le 1\text{)} \\ &\therefore \Pr\{X>k\}<b(k;n,p) \end{align*}

Thus we have

Pr{X>k}Pr{X>k1}=i=k+1nb(i;n,p)i=knb(i;n,p)=i=k+1nb(i;n,p)b(k;n,p)+i=k+1nb(i;n,p)<1/2\begin{align*} \frac{\Pr\{X>k\}}{\Pr\{X>k-1\}} &= \frac{\displaystyle\sum_{i=k+1}^n b(i;n,p)}{\displaystyle\sum_{i=k}^n b(i;n,p)} \\ &= \frac{\displaystyle\sum_{i=k+1}^n b(i;n,p)}{b(k;n,p)+\displaystyle\sum_{i=k+1}^n b(i;n,p)} \\ &< 1/2 \end{align*}

since i=k+1nb(i;n,p)<b(k;n,p)\sum_{i=k+1}^n b(i;n,p) < b(k;n,p).

Exercise C.5-3

We see that p=a/(a+1)p=a/(a+1) and q=1p=1/(a+1)q=1-p=1/(a+1), so a=p/qa=p/q. Notice also that 0<k<np0<k<np. Define XX to be a random variable with a binomial distribution b(k;n,p)b(k;n,p). We have

i=0k1(ni)ai=i=0k1(ni)(pq)i=i=0k1(ni)piqniqn=i=0k1b(i;n,p)qn=Pr{X<k}qn\begin{align*} \sum_{i=0}^{k-1}\binom{n}{i}a^i &= \sum_{i=0}^{k-1}\binom{n}{i}\left(\frac{p}{q}\right)^i \\[1mm] &= \frac{\displaystyle\sum_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i}}{q^n} \\[1mm] &= \frac{\displaystyle\sum_{i=0}^{k-1}b(i;n,p)}{q^n} \\[3mm] &= \frac{\Pr\{X<k\}}{q^n} \end{align*}

All conditions for applying Theorem C.4 are satisfied, thus we get

i=0k1(ni)ai<kqqn(npk)b(k;n,p)=k/(a+1)(1/(a+1))n(na/(a+1)k)b(k;n,p)=(a+1)nknak(a+1)b(k;n,a/(a+1))\begin{align*} \sum_{i=0}^{k-1}\binom{n}{i}a^i &< \frac{kq}{q^n(np-k)}\,b(k;n,p) \\ &= \frac{k/(a+1)}{(1/(a+1))^n(na/(a+1)-k)}\,b(k;n,p) \\[1mm] &= (a+1)^n\,\frac{k}{na-k(a+1)}\,b(k;n,a/(a+1)) \end{align*}

Exercise C.5-4

i=0k1piqnii=0k1(ni)piqni((ni)1 for all 0in)=i=0k1b(i;n,p)<kqnpkb(k;n,p)(by Theorem C.4)kqnpk(npk)k(nqnk)nk(by Lemma C.1)\begin{align*} \sum_{i=0}^{k-1}p^iq^{n-i} &\le \sum_{i=0}^{k-1}\binom{n}{i}p^iq^{n-i} && \Bigg(\binom{n}{i} \ge 1 \text{ for all } 0 \le i \le n \Bigg)\\ &= \sum_{i=0}^{k-1}b(i;n,p) \\ &< \frac{kq}{np-k}\,b(k;n,p) && \text{(by Theorem C.4)} \\ &\le \frac{kq}{np-k}\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k} && \text{(by Lemma C.1)} \end{align*}

Exercise C.5-5

We must fit into the form dictated by Theorem C.8, so let us work backwards, starting from the known fact r>nμ=nE[X]=E[nX]=E[X]=μr>n-\mu=n-\mathbb{E}{[X]}=\mathbb{E}{[n-X]}=\mathbb{E}{[X']}=\mu'. Thus, we have

Pr{μXr}=Pr{(nμ)Xr}=Pr{Xμr}(μer)r=((nμ)er)r\begin{align*} \Pr\{\mu-X\ge r\} &= \Pr\{(n-\mu')-X\ge r\} \\ &= \Pr\{X'-\mu'\ge r\} \\ &\le \left(\frac{\mu' e}{r}\right)^r \\ &= \left(\frac{(n-\mu)e}{r}\right)^r \end{align*}

For the second part just replace μ=E[X]=np\mu=\mathbb{E}{[X]}=np in the formula above and simplify nnp=nqn-np=nq.

Exercise C.5-6

We follow the hint from the book and show that pieαqi+qieαpieα2 ⁣/2p_ie^{\alpha q_i}+q_ie^{-\alpha p_i} \le e^{\alpha^2\!/2} for all nonnegative reals pip_i, qiq_i and α\alpha such that pi+qi=1p_i+q_i=1. The left hand side of the inequality is the moment generating function of a real-valued random variable Yi=XipiY_i=X_i-p_i, where XiX_i is an indicator random variable taking value of 1 with probability pip_i, as described in the book. Apparently, E[Yi]=0\mathbb{E}{[Y_i]}=0 and piYiqi-p_i \le Y_i \le q_i , thus it is bounded. Therefore, we can apply the Hoeffding's lemma (see the attached PDF, which was downloaded on August, 2025 from here), which says

E[eαYi]=pieαqi+qieαpieα2(qi(pi))2/8=eα2(qi+pi)2/8=eα2/8eα2/2\begin{align*} \mathbb{E}{[e^{\alpha Y_i}]} &= p_ie^{\alpha q_i}+q_ie^{-\alpha p_i} \\ &\le e^{\alpha^2(q_i-(-p_i))^2/8} \\ &=e^{\alpha^2(q_i+p_i)^2/8} \\ &=e^{\alpha^2/8} \\ &\le e^{\alpha^2/2} \quad \checkmark \end{align*}

We follow the outline of the proof of Theorem C.8, using the previous inequality instead of (C.49). Thus, we have

E[eα(Xμ)]=i=1nE[eα(Xipi)]=i=1n(pieαqi+qieαpi)i=1neα2 ⁣/2=exp(α2n/2)\begin{align*} \mathbb{E}{[e^{\alpha(X-\mu)}]} &= \prod_{i=1}^n\mathbb{E}{[e^{\alpha(X_i-p_i)}]} \\ &= \prod_{i=1}^n(p_ie^{\alpha q_i}+q_ie^{-\alpha p_i}) \\ &\le \prod_{i=1}^ne^{\alpha^2\!/2} \\ &= \exp(\alpha^2n/2) \end{align*}

It follows that

Pr{Xμr}=Pr{eα(Xμ)eαr}(by equation (C.47))E[eα(Xμ)]eαr(by inequality (C.48))exp(α2n/2αr)\begin{align*} \Pr\{X-\mu\ge r\} &= \Pr\{e^{\alpha(X-\mu)}\ge e^{\alpha r}\} && \text{(by equation (C.47))} \\ &\le \mathbb{E}{[e^{\alpha(X-\mu)}]}e^{-\alpha r} && \text{(by inequality (C.48))} \\ &\le \exp(\alpha^2n/2-\alpha r) \end{align*}

We have a standard quadratic polynomial in terms of α\alpha that achieves its minimum value at α=r/n\alpha=r/n. We finally have

Pr{Xμr}exp((r/n)2n/2(r/n)r)=er2 ⁣/2n\begin{align*} \Pr\{X-\mu\ge r\} &\le \exp\left((r/n)^2n/2-(r/n)r\right) \\ &= e^{-r^2\!/{2n}} \end{align*}

Exercise C.5-7

The function f(x)=eg(x)f(x)=e^{g(x)} is monotonically increasing in terms of g(x)g(x), so minimizing g(x)g(x) also minimizes f(x)f(x). Therefore, we need to find the first and second derivates of the the exponent in the right-hand side of inequality (C.51); we have

f(α)=μeαr=0    αmin=ln(r/μ)f(αmin)=μeαmin>0αmin is minimum, assuming μ>0\begin{align*} f'(\alpha) &= \mu e^\alpha-r=0 \implies \alpha_{min} = \ln (r/\mu) \\ f''(\alpha_{min}) &= \mu e^{\alpha_{min}} > 0 \therefore \alpha_{min} \text{ is minimum, assuming } \mu > 0 \end{align*}

Problem C-1

a.

See the solution here. The site contains lots of interesting puzzles worthwhile to study.

b.

The first three columns constitute the components of an outcome.

Initial Pick
Has Offer?
Switch?
Winning?
Probability

Right

Yes

Yes

No

prightpswitch/3p_{right}p_{switch}/3

Right

Yes

No

Yes

pright(1pswitch)/3p_{right}(1-p_{switch})/3

Wrong

Yes

Yes

Yes

2pwrongpswitch/32p_{wrong}p_{switch}/3

Wrong

Yes

No

No

2pwrong(1pswitch)/32p_{wrong}(1-p_{switch})/3

Right

No

No

Yes

(1pright)/3(1-p_{right})/3

Wrong

No

No

No

2(1pwrong)/32(1-p_{wrong})/3

c.

Just sum up the probabilities of the winning outcomes.

d.

He should minimize the expression from part (c). Therefore, his best choices are pright=1p_{right}=1 and pwrong=0p_{wrong}=0.

e.

When pswitch=0p_{switch}=0 then Monty cannot influence the player's probability of winning a car, so all possible strategies are optimal for him.

f.

We should maximize the expression from part (c). Therefore, we have

pswitch={1if 2pwrongpright,0otherwise.p_{switch} = \begin{cases} 1 & \text{if } 2p_{wrong} \ge p_{right}, \\ 0 & \text{otherwise}. \end{cases}

g.

The combination pright=1p_{right}=1 and pwrong=0p_{wrong}=0 minimizes the probability of winning as a function of these variables. In this case, pwinning=(1pswitch)/3p_{winning}=(1-p_{switch})/3, so setting pswitch=0p_{switch}=0 maximizes this probability.

h.

Let event PP designate that our initial pick is correct, event OO that Monty offers us an opportunity to switch, and event WWthat we win a car. The total probability of getting an offer is

Pr{O}=Pr{OP}Pr{P}+Pr{OP}Pr{P}=(pright+2pwrong)/3\Pr\{O\}=\Pr\{O|P\}\Pr\{P\}+\Pr\{O|\overline{P}\}\Pr\{\overline{P}\}=(p_{right}+2p_{wrong})/3

We know that Monty gives an offer, so pright>0pwrong>0p_{right}>0 \lor p_{wrong} > 0, hence pright+2pwrong>0p_{right}+2p_{wrong}>0.

Define event SS to stand for our decision to switch. The joint event of winning a car and having an opportunity to switch is

WO=((PS)(PS))O=((PO)S)((PO)S)\begin{align*} W\cap O &= \left((P\cap\overline{S})\cup(\overline{P}\cap S)\right)\cap O \\ &= \left((P\cap O)\cap\overline{S}\right)\cup\left((\overline{P}\cap O)\cap S\right) \end{align*}

As stated in the problem description, we have no knowledge of Monty's possible motivations or strategies. Consequently, SS is independent of the other events. Thus, we have

Pr{WO}=Pr{WO}Pr{O}=Pr{(PO)S}+Pr{(PO)S}Pr{O}(mutually disjoint events)=Pr{OP}Pr{P}Pr{S}+Pr{OP}Pr{P}Pr{S}Pr{O}=pright(1pswitch)/3+2pwrongpswitch/3(pright+2pwrong)/3=prightprightpswitch+2pwrongpswitchpright+2pwrong\begin{align*} \Pr\{W\mid O\} &= \cfrac{\Pr\{W\cap O\}}{\Pr\{O\}} \\[1mm] &= \cfrac{\Pr\{(P\cap O)\cap\overline{S}\}+\Pr\{(\overline{P}\cap O)\cap S\}}{\Pr\{O\}} && \text{(mutually disjoint events)} \\[1mm] &= \cfrac{\Pr\{O\mid P\}\Pr\{P\}\Pr\{\overline{S}\}+\Pr\{O\mid\overline{P}\}\Pr\{\overline{P}\}\Pr\{S\}}{\Pr\{O\}} \\[1mm] &= \cfrac{p_{right}(1-p_{switch})/3+2p_{wrong}p_{switch}/3}{(p_{right}+2p_{wrong})/3} \\[1mm] &= \cfrac{p_{right}-p_{right}p_{switch}+2p_{wrong}p_{switch}}{p_{right}+2p_{wrong}} \end{align*}

i.

When pswitch=1/2p_{switch}=1/2 then the value of the expression (C.52) is 1/2. When pswitch<1/2p_{switch}<1/2 then Monty can choose pright=0p_{right}=0 and pwrong=1p_{wrong}=1, otherwise, if pswitch>1/2p_{switch}>1/2 then he can select pright=1p_{right}=1 and pwrong=0p_{wrong}=0. In both cases he can reduce the probability below 1/2.

j.

Not knowing anything about a problem's setup, like in this case pertaining to Monty's motivations and strategies around giving opportunities for switching, our best tactic in yes/no decision making situations is simply to flip a coin. Such a Bernoulli type random experiment has a great potential to annihilate sophisticated methods of other participants in reducing our chances of winning.

This problem has highlighted how subtle variations in assumptions may significantly influence the output. Furthermore, we have seen that probability theory is full of surprises compared to our everyday intuition. As a matter of fact, it has demonstrated that we are, in general, inapt in handling uncertainties and reasoning about random processes.

Problem C-2

a.

There are bb choices for each ball resulting in bnb^n ways of putting them into bins.

b.

We follow the hint from the book. There are (b+n1)!(b+n-1)! possible arrangements (permutations) of nn distinct balls and b1b-1 indistinguishable sticks. Sticks essentially partition the balls into bins, possibly allowing empty bins, too. For example, |21||3 says that bins 1 and 3 are empty, bin 2 contains the balls 2,1 (in this specific order), and ball 3 is in the last bin. Since the sticks are indistinguishable, we must divide the total number of arrangements with (b1)!(b-1)! to get the result from the book.

c.

We again follow the hint from the book. Of the arrangements in part (b), n!n! are repeated if the balls are made identical. Therefore, we have (b+n1)!n!(b1)!=(b+n1n)\dfrac{(b+n-1)!}{n!(b-1)!}=\binom{b+n-1}{n}.

d.

Out of bb bins we select nn of them that will contain a ball. This is a classical example of counting combinations, so the answer is (bn)\binom{b}{n}.

e.

We reuse the idea from part (b) with b1b-1 indistinguishable sticks. The only difference is that we now limit the placement (positions) of those sticks to be always between some number of balls. There are n1n-1 allowable positions (between any neighboring numbers) for sticks. From these we select b1b-1 of them, thus we get (n1b1)\binom{n-1}{b-1}.

Last updated