Chapter 5

Exercise 5.1-1

We prove the contrapositive, if we don't have total order on the ranks of the candidates, then we cannot always determine which candidate is best, in line 4 of procedure Hire-Assistant.

A lack of total order implies that there are incomparable candidates. Assume one such pair is denoted by indices i<ji<j and currently best=ibest=i. When we reach candidate jj, we cannot evaluate the condition in line 4. Consequently, we are unable to decide should we keep ii or switch to jj.

Exercise 5.1-2

We can employ an algorithm similar to binary search. Let n=ba+1n=b-a+1 be a size of the interval [a,b][a,b]and assume that one call to Random(0,1) takes a constant amount of time.

Each iteration splits the current range of size m>1m>1 into m/2\lfloor m/2 \rfloor and m/2\lceil m/2 \rceil sized chunks, randomly discards one part of it, and continues until the size of the range drops to 1. The number of iterations (expected running time) is Θ(lgn)\Theta(\lg n). The probability of x[a,b]x \in [a,b] being selected is i=1lgn(1/2)=1/2lgn=1/n\prod_{i=1}^{\lg n} (1/2)=1/2^{\lg n}=1/n, since calls to Random(0,1) can be regarded as Bernoulli trials with probability of success 1/2. Therefore, Random(a,b) ensures uniform distribution over returned values.

Exercise 5.1-3

Available in the latest revision of the IM.

Exercise 5.2-1

Available in the latest revision of the IM.

Exercise 5.2-2

Available in the latest revision of the IM.

Exercise 5.2-3

Available in the latest revision of the IM.

Exercise 5.2-4

Available in the latest revision of the IM.

Exercise 5.2-5

Available in the latest revision of the IM.

Exercise 5.2-6

Available in the latest revision of the IM.

Exercise 5.3-1

Available in the latest revision of the IM.

Exercise 5.3-2

Available in the latest revision of the IM.

Exercise 5.3-3

Available in the latest revision of the IM.

Exercise 5.3-4

Available in the latest revision of the IM.

Exercise 5.3-5

Available in the latest revision of the IM.

Exercise 5.4-1

In both subproblems we assume that birthdays are independent and uniformly distributed.

Subproblem 1

We follow a similar analysis as for the birthday paradox from the book. Let AiA_i be the event that person ii has a different birthday than I do. We have that Pr{Ai}=11/n\Pr \{A_i\}=1-1/n. Let Bk=i=1kAiB_k = \bigcap_{i=1}^kA_i be the event that none of the kk people has the same birthday as I do. Observe that Pr{AkBk1}=Pr{Ak}\Pr\{A_k|B_{k-1}\}=\Pr\{A_k\}. We iteratively apply the recurrence (5.8) to obtain

Pr{Bk}=i=1kPr{Ai}=i=1k(11n)=(11n)k\begin{align*} \Pr\{B_k\} &= \prod_{i=1}^k\Pr\{A_i\} \\ &= \prod_{i=1}^k\left(1-\frac{1}{n}\right) \\ &= \left(1-\frac{1}{n}\right)^k \end{align*}

We need to find kk such that Pr{Bk}1/2\Pr\{\overline B_k\} \ge 1/2, or Pr{Bk}1/2\Pr\{B_k\} \le 1/2. This gives k=log11/n(1/2)k= \lceil\log_{1-1/n}(1/2)\rceil. For n=365n=365 it is k=253k=253.

Subproblem 2

Let the random variable XkX_k denote the number of people (out of kk) having a birthday on July 4. This variable follows the binomial distribution, so Pr{Xk=i}=b(i;k,1/n)\Pr\{X_k=i\}=b(i;k,1/n). We are looking for the smallest kk such that Pr{Xk=0}+Pr{Xk=1}<1/2\Pr\{X_k=0\}+\Pr\{X_k=1\} < 1/2. For n=365n=365 this gives k=613k=613.

Exercise 5.4-2

We need to find kk such that Pr{Bk}10.99=0.01\Pr\{B_k\}\le1-0.99=0.01. Solving for kk gives

k1+1+8nln1002k \ge \frac{1+\sqrt{1+8n\ln100}}{2}

For n=365n=365 it is k=59k=59. Plugging these values into the formula k(k1)2n\frac{k(k-1)}{2n} we get, that the the expected number of pairs of people with the same birthday is approximately 4.69.

Exercise 5.4-3

Observe that the first toss will land in an empty bin. Let XX be a random variable denoting how many additional tosses are needed until some bin contains two balls. Obviously Pr{X1}=1\Pr\{X \ge 1\}=1, as we surely need at least one extra toss. Pr{X2}=(b1)/b\Pr\{X \ge 2\}=(b-1)/b, since we must have 1 miss. In general, at least kk tosses are required, when none of the first k1k-1 balls ended up in an occupied bin, so

Pr{Xk}=b1bb2bb(k1)b=bbb1bb2bb(k1)b=b!(bk)!bk\begin{align*} \Pr\{X\ge k\} &= \frac{b-1}{b}\cdot\frac{b-2}{b}\cdots\frac{b-(k-1)}{b} \\[1mm] &= \frac{b}{b}\cdot\frac{b-1}{b}\cdot\frac{b-2}{b}\cdots\frac{b-(k-1)}{b} \\[1mm] &= \frac{b!}{(b-k)!b^k} \end{align*}

We know that Pr{X=k}=Pr{Xk}Pr{Xk+1}\Pr\{X=k\}=\Pr\{X \ge k\}-\Pr\{X \ge k+1\} and Pr{Xb+1}=0\Pr\{X \ge b+1\}=0. When a random variable XX takes on values from the set of natural numbers, then we have a nice formula for its expectation (see equation (C.28) on page 1193 in the book). Therefore, the expected number of tosses is

1+k=1bPr{Xk}=1+k=1bb!(bk)!bk=Θ(b)1+\sum_{k=1}^b \Pr\{X \ge k\} = 1+\sum_{k=1}^b\frac{b!}{(b-k)!b^k}=\Theta(\sqrt b)

The expression above is exactly the same as for finding an average number of people to get at least one shared birthday. You can read more about it here, where you will also find an explanation for the asymptotic growth.

Exercise 5.4-4

It is important that the birthdays be mutually independent.

The conditional probability chain, obtained by iteratively applying the recurrence (5.8), depends on ensuring that, at each step, the birthday variable bib_i has the same uniform distribution, unaffected by the previous choices apart from not colliding with earlier birthdays. This chain is only valid when the entire joint assignment is described by uniform, mutually independent random variables. Pairwise independence does not ensure that, once several variables have been assigned, the distribution of the next variable is still as required for the calculation: dependencies among larger groups can profoundly alter probabilities for intersections and unions of events. On page 1188 of the book, you have a concrete example about the difference between pairwise and mutual independence.

Exercise 5.4-5

This sequence lists solutions for variations of the birthday problem depending on how many people should share the same birthday. For this exercise it is k=88k=88. Let us see how we can derive it analytically.

We assume that birthdays are mutually independent and uniformly distributed, so any sequence of kk birthdays occurs with probability 1/nk1/n^k. For k>1k>1 and i=0,1,,k/2i=0, 1, \dots, \lfloor k/2\rfloor, let QiQ_i be the probability that among kk people exactly ii pairs of people have matching birthdays, which implies that no triples of people share birthdays. Our probability PkP_k that among kk people there are triples sharing birthdays is given by

Pk=1i=0k/2QiP_k=1-\sum_{i=0}^{\lfloor k/2\rfloor}Q_i

We can select ii days out of nn, and assign these to ii unallocated pairs of people, in (ni)\binom{n}{i} ways. The remaining nin-i days can be assigned to the remaining k2ik-2i people in (nik2i)\binom{n-i}{k-2i} ways. Since people are distinguishable, the total number of allocations should be multiplied by k!k!. We also need to divide the resulting number by 2i2^i, since people sharing the same birthday are relatively unordered. After combining these pieces, we get

Qi=1nkk!2i(ni)(nik2i)Q_i = \frac{1}{n^k} \cdot \frac{k!}{2^i}\binom{n}{i}\binom{n-i}{k-2i}

You can verify that k=88k=88 entails Qi1/2Q_i \ge 1/2.

Exercise 5.4-6

We can form nkn^k kk-strings over a set of size nn. Any kk-string with distinct elements is a kk-permutation. By equation (C.1), there are n(n1)(nk+1)n(n-1)\cdots(n-k+1) possible kk-permutations. If we denote by BkB_k the event that kk people have distinct birthdays, then the probability that a kk-string over a set of size nn forms a kk-permutation equals Pr{Bk}\Pr\{B_k\}; on page 141 of the book you have this in expanded form.

Exercise 5.4-7

Available in the latest revision of the IM.

Exercise 5.4-8

Let's partition the nn coin flips into at least n/s\lfloor n/s \rfloor groups of s=lgn2lglgns=\lfloor \lg n-2\lg\lg n\rfloor consecutive flips and bound the probability that no group comes up all heads. By equation (5.9), the probability that the group starting in position ii comes up all heads is

Pr{Ai,s}=12s12lgn2lglgn=22lglgn2lgn=lg2nn\begin{align*} \Pr\{A_{i,\,s}\} &= \cfrac{1}{2^{s}} \\[1mm] &\ge \cfrac{1}{2^{\lg n-2\lg\lg n}} \\[3mm] &= \cfrac{2^{2\lg\lg n}}{2^{\lg n}} \\[3mm] &= \cfrac{\lg^2n}{n} \end{align*}

The probability that a streak of heads of length at least ss does not begin in position ii is therefore at most 1(lg2n)/n1-(\lg^2n)/n. Since the n/s\lfloor n/s \rfloor groups are formed from mutually exclusive, independent coin flips, the probability that everyone of these groups fails to be a streak of length ss is at most

(1lg2nn)n/s(1lg2nn)n/s1(1lg2nn)n/(lgn2lglgn)1(e(lg2n)/n)n/(lgn2lglgn)(by inequality (3.14))=e(lg2n)/(lgn2lglgn)<e(lg2n)/lgn(for sufficiently large n)=elgn=1/n\begin{align*} \left(1-\frac{\lg^2n}{n}\right)^{\lfloor n/s \rfloor} &\le \left(1-\frac{\lg^2n}{n}\right)^{n/s-1} \\ &\le \left(1-\frac{\lg^2n}{n}\right)^{n/(\lg n-2\lg\lg n)-1} \\ &\le \left(e^{-(\lg^2n)/n}\right)^{n/(\lg n-2\lg\lg n)} && \text{(by inequality (3.14))} \\ &= e^{-(\lg^2n)/(\lg n-2\lg\lg n)} \\ &< e^{-(\lg^2n)/\lg n} && \text{(for sufficiently large $n$)} \\ &= e^{-\lg n} \\ &= 1/n \end{align*}

Using the same reasoning as for deriving equation (5.13) (see page 148 in the book), the probability that the longest streak equals or exceeds ss is 11/n1-1/n.

Problem 5-1

Available in the latest revision of the IM.

Problem 5-2

Available in the latest revision of the IM.

A more rigorous treatment of tasks (e) and (f) is based on conditional expectation. Let Xi=I{A[i] is examined}X_i=I\{A[i] \text{ is examined}\} and X=i=1nXiX=\sum_{i=1}^n X_i equal the number of indices of A that are examined. Define S={i:A[i]=x}S=\{i:A[i]=x\} and S={i:A[i]x}\overline S=\{i:A[i] \neq x\}, so that S=k|S|=k and S=nk|\overline S|=n-k. A position iSi \in S is examined only if it’s the first position in SS, which occurs with probability 1/k1/k. A position iSi \notin S is examined only if, out of position ii and all kk positions in SS, ii is the first position, which occurs with probability 1/(k+1)1/(k+1). Let CSC_S be an event that the indices of elements we are searching for belong in SS. We have

E[Xi]=E[E[XiCS]]=all S(E[XiCS]Pr{CS})=1(nk)all SE[XiCS](Pr{CS}=1/(nk))=1(nk)(nk)Pr{XiCS}(E[XiCS]=Pr{XiCS})=Pr{XiCS}={1/kif iS,1/(k+1)otherwise.\begin{align*} \mathbb{E}[X_i] &= \mathbb{E}[\mathbb{E}[X_i|C_S]] \\ &= \sum_{\text{all }S} (\mathbb{E}[X_i|C_S] \Pr\{C_S\}) \\ &= \cfrac{1}{\displaystyle{\binom{n}{k}}} \sum_{\text{all }S} \mathbb{E}[X_i|C_S] && \text{($\Pr\{C_S\}=1/\binom{n}{k}$)} \\ &= \cfrac{1}{\displaystyle\binom{n}{k}} \cdot \binom{n}{k} \cdot \Pr\{X_i|C_S\} && \text{($\mathbb{E}[X_i|C_S]=\Pr\{X_i|C_S\}$)} \\ &= \Pr\{X_i|C_S\} \\ &=\begin{cases} 1/k &\text{if } i \in S, \\ 1/(k+1) &\text{otherwise}. \end{cases} \end{align*}
E[X]=E[i=1nXi]=i=1nE[Xi](by linearity of expectation)=iSE[Xi]+iSE[Xi]=iS1k+iS1k+1=k1k+(nk)1k+1=n+1k+1\begin{align*} \mathbb{E}[X] &= \mathbb{E}[\sum_{i=1}^n X_i] \\ &= \sum_{i=1}^n \mathbb{E}[X_i] && \text{(by linearity of expectation)} \\ &= \sum_{i \in S} \mathbb{E}[X_i] + \sum_{i \notin S} \mathbb{E}[X_i] \\ &= \sum_{i \in S} \frac{1}{k} + \sum_{i \notin S} \frac{1}{k+1}\\ &= k \cdot \frac{1}{k} + (n-k) \cdot \frac{1}{k+1} \\ &= \frac{n+1}{k+1} \end{align*}

Last updated