> For the complete documentation index, see [llms.txt](https://evarga.gitbook.io/sh-aofa/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/chapter-8-strings-and-tries.md).

# Chapter 8: Strings and Tries

## Exercise 8.1

As mentioned in the book, $$\[z^N]B\_P(z)$$ satisfies the same recurrence as the generalized Fibonacci numbers of [Exercise 4.18](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/6pjaXyRo8WpOT0GFIIZ1#exercise-4.18). This gives

$$
\boxed{b\_N = b\_{N-1} + b\_{N-2} + \dots + b\_{N-P} \qquad \text{ for $N\ge P$}}.
$$

The other one can be derived from the formula of Theorem 8.2 by rearranging terms

$$
(1 - 2z + z^{P+1}) \sum\_{N\ge0} b\_N z^N = 1 - z^P.
$$

Now, we extract the coefficient of $$z^N$$ on both sides. For $$N>P$$ the RHS is 0, thus

$$
b\_N - 2b\_{N-1} + b\_{N-P-1} = 0 \implies \boxed{b\_N = 2b\_{N-1} - b\_{N-P-1} \qquad \text{ for $N>P$}}.
$$

Remember that $$b\_N$$ represents the number of bitstrings of length $$N$$ with no runs of $$P$$ consecutive 0s. If the length of a bitstring is strictly less than $$P$$, it’s physically impossible for the string to contain a run of $$P$$ consecutive 0s. Therefore, every possible bitstring of length $$N$$ is valid. The initial values are

$$
b\_N = 2^N \quad \text{for} \quad 0 \le N < P.
$$

Observe that $$b\_P$$ can be calculated from the first recurrence, so the second one becomes fully defined, too.

## Exercise 8.2

We need to solve the following inequality

$$
1-c\_P\left(\frac{\beta\_P}{2}\right)^N\ge0.99,
$$

with $$P=3$$. Using a computer algebra system and Table 8.3, we get $$N=57$$.

{% hint style="info" %}
We can also compute the values of $$b\_{56}$$ and $$b\_{57}$$ using one of the recurrences from the previous exercise (for example, with the help of a Python script). The exact calculation would confirm the validity of our stated result.
{% endhint %}

## Exercise 8.3

This time we should rely on the asymptotic formula from the previous exercise (with the RHS of 0.5), as exact calculations based on recurrence would become prohibitively slow. The parameters are (these can be attained using a computer algebra system):

* $$\beta\_{32} \approx 1.9999999997671694$$
* $$c\_{32} \approx 1.0000000034924597$$

We get $$N = 5,954,088,975$$.

## Exercise 8.4 🌟

{% hint style="success" %}
This exercise showcases a general approach of extracting coefficients from GFs; transforming the GF into a geometric series and expanding the corresponding binomial.
{% endhint %}

We start with the formula from Theorem 8.2 and extract the coefficient $$\[z^N]B\_P(z)$$. We can do this by expanding the rational function into a power series using the geometric series formula and the binomial theorem. Let’s rearrange terms

$$
S(z) = \frac{1}{1 - 2z + z^{P+1}} \implies B\_P(z) = (1 - z^P) S(z) = S(z) - z^P S(z).
$$

This gives

$$
\[z^N]B\_P(z) = \[z^N]S(z) - \[z^{N-P}]S(z).
$$

We can transform $$S(z)$$ into a geometric series and expand

$$
\begin{align\*}
S(z) &= \frac{1}{1 - (2z - z^{P+1})} \\
&= \sum\_{j \ge 0} (2z - z^{P+1})^j \\
&= \sum\_{j \ge 0} \sum\_{i=0}^j \binom{j}{i} (2z)^{j-i} (-z^{P+1})^i \\
&= \sum\_{j \ge 0} \sum\_{i=0}^j \binom{j}{i} 2^{j-i} (-1)^i z^{j-i} z^{(P+1)i} \\
&= \sum\_{j \ge 0} \sum\_{i=0}^j\binom{j}{i} (-1)^i 2^{j-i} z^{j + Pi}.
\end{align\*}
$$

We want to find the coefficient of a general term $$z^K$$. This gives

$$
j + Pi = K \implies j = K - Pi.
$$

We have

$$
\[z^K]S(z) = \sum\_i (-1)^i 2^{K-(P+1)i} \binom{K-Pi}{i}.
$$

Now we substitute $$K = N$$ and $$K = N-P$$ into our general coefficient formula to solve our original equation

$$
\begin{align\*}
\[z^N]S(z) &= \sum\_i (-1)^i 2^{N-(P+1)i} \binom{N-Pi}{i} \\
\hline \\
\[z^{N-P}]S(z) &= \sum\_i (-1)^i 2^{(N-P)-(P+1)i} \binom{(N-P)-Pi}{i} \\
&= \sum\_i (-1)^i 2^{N-(P+1)i - P} \binom{N-P(i+1)}{i} \\
&= \sum\_i (-1)^i 2^{N-(P+1)i} 2^{-P} \binom{N-P(i+1)}{i}.
\end{align\*}
$$

Putting everything together, we get

$$
\[z^N]B\_P(z) = \sum\_i (-1)^i 2^{N-(P+1)i} \left( \binom{N-Pi}{i} - 2^{-P} \binom{N-P(i+1)}{i} \right).
$$

## Exercise 8.5

Let’s build our BGF $$F(z,u)$$ using the symbolic method; $$\mathcal{F}$$ designates a class of bitstrings with leading 1s (which may be zero). The beauty of this approach is that the construction itself fully describes the underlying combinatorial decomposition, so everything is explicit. We have

$$
\mathcal{F}=SEQ(u\mathcal{Z}\_1)(\epsilon+\mathcal{Z}\_0\mathcal{B}).
$$

Notice that we must mark the leading 1s. This translates to

$$
F(z, u) = \frac{1}{1-uz} + \frac{z}{(1-uz)}B(z)= \frac{1-z}{(1-uz)(1-2z)}.
$$

Now, we simply follow the recipe laid out in Table 3.5 in the book to find the moments (mechanical steps are omitted):

* $$\mu\_N = 1 - 2^{-N}$$,
* $$\sigma\_N = \sqrt{2 - (2N+1)2^{-N} - 2^{-2N}}$$.

## Exercise 8.6

We just apply Theorem 8.2 and its corollary to get

$$
\begin{align\*}
B\_2(z) &= \frac{1+z}{1-z-z^2} \\
&= \sum\_{j \ge 0} F\_{j+2}z^j \\
&= \sum\_{j \ge 0} F\_{j+1}z^j + \sum\_{j \ge 0} F\_{j}z^j \\
&= \frac{1}{z}\sum\_{j \ge 0} F\_{j+1}z^{j+1} + \sum\_{j \ge 0} F\_{j}z^j \\
&= \frac{1}{z}\sum\_{j \ge 0} F\_{j}z^{j} + \sum\_{j \ge 0} F\_{j}z^j && \text{($F\_0=0$)} \\
\hline \\
B\_2(1/2) &= 2^3-2=(2+1)\sum\_{j \ge 0} \frac{F\_{j}}{2^j} \\
&\therefore \boxed{\sum\_{j \ge 0} \frac{F\_{j}}{2^j} = 2}.
\end{align\*}
$$

## Exercise 8.7 🌟

{% hint style="success" %}
This exercise demonstrates how to track extremal parameters, where we cannot easily construct a BGF using a single, simple symbolic decomposition.
{% endhint %}

Let $$L(z, u)$$ be our BGF, where $$z$$ marks the length of the string and $$u$$ marks the length of the longest run of 0s. Instead of trying to directly find the generating function for bitstrings where the longest run is exactly $$k$$, it’s much easier to think about boundaries. A bitstring has a longest run of 0s that is $$\le k$$ if and only if it has no runs of $$k+1$$ consecutive zeros. Therefore, the OGF for strings whose longest run of 0s is $$\le k$$ is exactly $$B\_{k+1}(z)$$. This gives

$$
\[u^k]L(z, u) = B\_{k+1}(z) - B\_k(z).
$$

We can now sum over all $$k$$ and multiply by $$u^k$$ to get

$$
\begin{align\*}
L(z, u) &= \sum\_{k \ge 0} u^k \left( B\_{k+1}(z) - B\_k(z) \right) \\
&= (B\_1(z) - B\_0(z)) + u(B\_2(z) - B\_1(z)) + u^2(B\_3(z) - B\_2(z)) + \dots \\
&= -B\_0(z) + B\_1(z)(1-u) + u B\_2(z)(1-u) + u^2 B\_3(z)(1-u) + \dots \\
&= \boxed{(1-u) \sum\_{k \ge 1} u^{k-1} B\_k(z)} && \text{($B\_0(z) = 0$)}.
\end{align\*}
$$

## Exercise 8.8 🌟

{% hint style="success" %}
This exercise introduces a different methodology for computing variance, when length itself is a random variable.
{% endhint %}

Let $$X$$ be the random variable denoting the position of the end of the first run of $$P$$ 0s ("wait time"); the length of the bitstring (prefix) is the random variable itself. The book establishes the relationship

$$
\[z^N]B\_P(z/2)=\Pr{X>N}.
$$

To compute the variance we need a different paradigm than the BGF method and Table 3.5. Because the thing we are measuring is already the length of the bitstring, we don't need a second variable $$u$$. We can extract everything we need directly from $$z$$. Let's define a new generating function $$S(z)$$ for these tail probabilities

$$
S(z) = \sum\_{N \ge 0} P(X > N) z^N= B\_P(z/2).
$$

The second corollary of Theorem 8.2 says that

$$
\mathbb{E}{\[X]} =  \sum\_{N \ge 0} P(X > N)=S(1)=B\_P(1/2) = 2^{P+1} - 2.
$$

To find the variance, besides the mean, we need the second factorial moment $$\mathbb{E}\[X(X-1)]$$. We’ve

$$
\begin{align\*}
S'(z) &= \sum\_{N \ge 1} N \cdot P(X > N) z^{N-1} \\
\hline \\
S'(1) &= \sum\_{N \ge 1} N \cdot P(X > N) \\
&= \sum\_{N \ge 1} N \left( \sum\_{k > N} P(X = k) \right) && \text{(expanding horizontally)} \\
&= \sum\_{k \ge 2} P(X = k) \cdot \left( \sum\_{N = 1}^{k-1} N \right) && \text{(expanding vertically)} \\
&= \sum\_{k \ge 2} P(X = k) \cdot \frac{k(k-1)}{2} \\
&= \frac{1}{2} \sum\_{k \ge 2} k(k-1) P(X = k) \\
&= \frac{1}{2} \sum\_{k \ge 0} k(k-1) P(X = k) \\
&= \frac{1}{2} \mathbb{E}\[X(X-1)].
\end{align\*}
$$

Equate this quantity with the one derived in a different manner. Let’s differentiate both sides of our basic equation and evaluate it at $$z=1$$. This gives

$$
S'(z) = \frac{1}{2} B'\_P\left(\frac{z}{2}\right) \implies \mathbb{E}\[X(X-1)] = B'\_P(1/2).
$$

Substitute everything into the standard variance identity (see Section 1.7 in the book)

$$
\text{Var}(X) = \mathbb{E}\[X(X-1)] + \mathbb{E}\[X] - (\mathbb{E}\[X])^2= \boxed{B'\_P(1/2) + B\_P(1/2) - (B\_P(1/2))^2}.
$$

Plugging everything into the above formula and simplifying, we get

$$
\text{Var}(X) = 2^{2P+2} - (2P+1)2^{P+1} - 2.
$$

Therefore,

$$
\sigma\_X = \sqrt{2^{2P+2} - (2P+1)2^{P+1} - 2}.
$$

## Exercise 8.9

The next Python script generates the plot by evaluating the GF from the book.

{% code expandable="true" %}

```python
import sympy as sp
import matplotlib.pyplot as plt

def plot_average_longest_run(max_N=99):
    z = sp.Symbol('z')

    print(f"Calculating generating functions up to N={max_N}... this might take a few minutes.")
    s = 0
    for k in range(max_N + 1):
        term = 1 / (1 - 2 * z) - (1 - z ** k) / (1 - 2 * z + z ** (k + 1))
        term_series = sp.series(term, z, 0, max_N + 1).removeO()
        s += term_series
    gf = sp.Poly(s, z)

    print("Extracting coefficients and computing averages...")
    n_vals = list(range(3, max_N + 1))
    average_lengths = []
    for n in n_vals:
        c_n = gf.nth(n)
        avg = float(c_n) / (2 ** n)
        average_lengths.append(avg)

    plt.figure(figsize=(10, 6))
    plt.plot(n_vals, average_lengths, marker='o', markersize=4, linestyle='-', color='b')
    plt.title("Average Length of Longest Run of 0s in an N-bit String")
    plt.xlabel("String Length (N)")
    plt.ylabel("Average Longest Run")
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

if __name__ == "__main__":
    plot_average_longest_run(99)
```

{% endcode %}

<figure><img src="/files/cl4i3K4esdGVzimxdJqg" alt=""><figcaption></figcaption></figure>

## Exercise 8.10 🌟

{% hint style="success" %}
This exercise encompasses several important details from probability theory and showcases how even simple programs demand a careful analysis.
{% endhint %}

The second corollary of Theorem 8.2 specifies the expected length of the bitstring (i.e., the expected position where the pattern finally ends). Because of the way the nested for loops work in Program 8.1, the algorithm re-examines characters whenever a partial match fails. This means the number of bits examined will be strictly greater than the expected length of the string.

We can view our random bitstring as a sequence of "failed blocks" followed by one "success block." A failed block can be described as $$0^k1$$, where $$0 \le k < P$$. Processing such a failed block demands $$(k+1)(k+2)/2$$ bit comparisons (a total of subsequent partial and failed matches). Processing the final $$P$$ 0s requires additional $$P$$ bit comparisons.

The probability of seeing $$P$$ consecutive 0s from an arbitrary position is $$p=2^{-P}$$. We can model the run of our program as a series of Bernoulli trials with probability $$p$$ of success. The random variable denoting the number of failures before the first success is governed by the [geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution). Its expected value is $$(1-p)/p= 2^P - 1$$.

Now, what is the probability that a failed block has exactly $$k$$ zeros? The unconditional probability of reading exactly $$k$$ 0s and then 1 is simply $$(1/2)^{k+1}$$. The total probability of any failure occurring (meaning we hit 1 before $$P$$ 0s) is the sum of these probabilities for all valid $$k$$. Thus,

$$
\Pr{\text{Failure}} = \sum\_{k=0}^{P-1} \frac{1}{2^{k+1}} = 1 - 2^{-P}.
$$

The conditional probability of seeing the pattern $$0^k1$$, given that the block is a failure

$$
\Pr{|Block|=k+1 \mid \text{Failure}} = \frac{2^{-(k+1)}}{1 - 2^{-P}}.
$$

The expected number of comparisons per failed block

$$
\mathbb{E}\[C\_{\text{fail}}] = \sum\_{k=0}^{P-1} \left( \frac{2^{-(k+1)}}{1 - 2^{-P}} \right) \frac{(k+1)(k+2)}{2}.
$$

Now, we can express the total expected number of comparisons on failed blocks, relying on [Wald's equation](https://en.wikipedia.org/wiki/Wald%27s_equation)

$$
\begin{align\*}
\text{Total Failed Comparisons} &= (2^P - 1) \cdot E\[C\_{\text{fail}}] \\
&= (2^P - 1) \sum\_{k=0}^{P-1} \left( \frac{2^{-(k+1)}}{1 - 2^{-P}} \right) \frac{(k+1)(k+2)}{2} \\
&= 2^P \sum\_{k=0}^{P-1} \frac{(k+1)(k+2)}{2^{k+2}}.
\end{align\*}
$$

The exact expected number of bits examined to find the first run of $$P$$ 0s is

$$
\text{Total Failed Comparisons}+P=2^{P+2} - \frac{P^2 + 3P + 8}{2}.
$$

## Exercise 8.11

This is a simple application of the corollary of Theorem 8.3:

1. $$c(z)=1 \implies \mu=2^P$$,
2. $$c(z)=1 \implies \mu=2^P$$,
3. $$c(z)=\sum\_{i=0}^{P/2-1} z^{2i} \implies \mu=\frac{4}{3}(2^P-1)$$,
4. $$c(z)=\sum\_{i=0}^{(P-1)/2} z^{2i} \implies \mu=\frac{4}{3}(2^P-\frac{1}{2})$$.

## Exercise 8.12

To find the pattern that appears earliest, we need to minimize $$c(1/2)$$. Since $$c(z)$$ always starts with a 1, the absolute smallest it can possibly be is just $$c(z) = 1$$. This happens when a pattern has zero internal overlaps—meaning no prefix matches any suffix. We’ve seen such patterns in the previous exercise.

Patterns consisted of all 0s or 1s are likely to appear the latest. They’re maximally autocorrelated, so $$c(1/2)$$ is maximized.

## Exercise 8.13

The standard deviation does depend on the pattern, because the expected value is dictated by the pattern's autocorrelation polynomial $$c(z)$$, the spread of that distribution (the variance and standard deviation) is fundamentally locked to that same polynomial. Following the steps from [Exercise 8.8](#exercise-8.8), we can see that the standard deviation depends both on $$c(1/2)$$ and $$c'(1/2)$$.

{% hint style="info" %}
Two different patterns of the same length $$P$$ can technically have the same expected waiting time if their $$c(1/2)$$ values somehow match (see Figure 8.5 in the book), but their variances will almost certainly differ because $$c'(1/2)$$ factors in where those overlaps occur.
{% endhint %}

## Exercise 8.14 🌟

{% hint style="success" %}
This exercise generalizes Theorem 8.3 and its second corollary to larger alphabets.
{% endhint %}

The GF for the number of strings over an $$M$$-character alphabet that do not contain the specific pattern of length $$P$$ is

$$
B\_p(z) = \frac{c(z)}{z^P + (1 - Mz)c(z)}.
$$

The expected position of the end of the first occurrence of a $$M$$-character pattern with autocorrelation polynomial $$c(z)$$ is given by $$\mu = M^P c(1/M)$$.

In our case, $$M=32$$, $$P=44$$ and $$c(z)=1$$. Therefore, the expected number of characters typed before the monkey hits upon the given phrase is $$32^{44}$$.

## Exercise 8.15

This is a variation of the previous exercise. The pattern has some overlaps, so

$$
c(z)=1+z^{13} \implies c(1/M)=1+M^{-13}.
$$

Since $$P=18$$ and $$M=32$$, the expected number of characters typed before the monkey hits upon the given phrase is $$32^{18}+32^5$$.

## Exercise 8.16

Table 8.6 already provides an answer for height no greater than 4. The following list gives the remaining OGFs and REs:

1. $$\text{height} \le 5$$, `(1(1(1(1(10)*0)*0)*0)*0)*`,  $$A(z)=\frac{1-4z+3z^2}{1-5z+6z^2-z^3}$$,
2. $$\text{height} \le 6$$, `(1(1(1(1(1(10)*0)*0)*0)*0)*0)*`, $$A(z)=\frac{1-5z+6z^2-z^3}{1-6z+10z^2-4z^3}$$.

## Exercise 8.17 🌟

{% hint style="success" %}
This exercise introduces an advanced mechanism for constructing restrictive REs.
{% endhint %}

Theorem 8.3 says that the GF is

$$
B\_p(z)=\frac{1+z^3+z^5}{z^6+(1-2z)(1+z^3+z^5)}.
$$

Finding a regular expression for strings avoiding a pattern is notoriously complex because REs are generative (they build strings) rather than restrictive. Unfortunately, there is no reverse mapping from an OGF back to unambiguous RE. We’ll model the pattern-matching process as a DFA and leverage [Arden's Theorem](https://www.geeksforgeeks.org/theory-of-computation/ardens-theorem-in-theory-of-computation/).

<figure><img src="/files/AAmB0x0onZRbwlsOcuul" alt=""><figcaption><p>DFA that tracks the matching process for the pattern <code>101101</code>. Because we are looking to avoid the pattern, any string that reaches state 6 (Trap) stays there and is essentially rejected. All other states are accepting.</p></figcaption></figure>

The system of equations for the given DFA is

$$
\begin{align\*}
S\_5 &= 0S\_0 + \epsilon \\
S\_4 &= 1S\_1 + 0S\_5 + \epsilon \\
S\_3 &= 0S\_2 + 1S\_4 + \epsilon \\
S\_2 &= 0S\_0 + 1S\_3 + \epsilon \\
S\_1 &= 1S\_1 + 0S\_2 + \epsilon \\
S\_0 &= 0S\_0 + 1S\_1 + \epsilon.
\end{align\*}
$$

Solving it and simplifying leads to

$$
( 0 + 11^\* 0 (10 + 111 1^\* 0)^\* (0 + 1100) )^\* ( 11^\* 0 (10 + 111 1^\* 0)^\* (1^\* + 110) + 1^\* )
$$

Here is the translation of this RE into a form recognized by tools. The `+` operator is a syntactic sugar for *one or many*, while the original `+` is replaced with the `|` symbol. Notice that the expression is anchored to capture the entire content.

{% code expandable="true" %}

```
^(0|1+0(10|111+0)*(0|1100))*(1+0(10|111+0)*(1*|110)|1*)$
```

{% endcode %}

{% hint style="info" %}
One handy on-line tool for working with REs is [Regular Expressions 101](https://regex101.com/).
{% endhint %}

## Exercise 8.18

The word *disjoint* means that the second search starts after the first match is "consumed." Since the bitstring is generated randomly, the sequence of bits generated after the first match is statistically identical to a brand new, entirely independent random bitstring. Therefore, due to this independence (memoryless property), the average position of the end of the second disjoint string of $$P$$ 0s is exactly double the expected position of the first occurrence

$$
2 \times (2^{P+1} - 2) = 2^{P+2} - 4.
$$

## Exercise 8.19

The RE `0*00` is unambiguous, so there is only one way to derive a string of $$N$$ 0s.&#x20;

The RE `0*00*` allows $$N$$ possibilities. Observe that the number of 0s taken on the first position completely determines the number of 0s on the last one, and vice versa. The zero in the middle is forced, so we can choose between 0 and $$N-1$$ for the first occurrence of `0*`.

## Exercise 8.20

{% hint style="danger" %}
The last plot and sequence is wrong. By taking a close inspection of the peak, we can see that it’s height is 5 instead of 4. This becomes apparent by looking at the generalized RE below.
{% endhint %}

1. $$(10)^3\space(1(10)1(10)^20(10)0)\space(10)^3\space(1(10)1(10)^30(10)^20)\space(10)^2$$
2. $$(10)^3\space(1(10)1(10)^20(10)0)\space(10)^3\space(1(10)1^40^4(10)^20)\space(10)^2$$

## Exercise 8.21

Table 8.5 lists all 4-bit patterns together with their waiting times. The expected number of 0s appearing before the first occurrence of each of the bit patterns of length 4 in a random bitstring is exactly half of their waiting times, minus the number of 0s in the patterns themselves. This gives

#### Group 1: Wait Time = 30

* `0000`: $$15 - 4 =11$$
* `1111`: $$15 - 0 =15$$

#### Group 2: Wait Time = 16

* `0001`: $$8 - 3 =5$$
* `1000`: $$8 - 3 =5$$
* `0011`: $$8 - 2 =6$$
* `1100`: $$8 - 2 =6$$
* `0111`: $$8 - 1 =7$$
* `1110`: $$8 - 1 =7$$

#### Group 3: Wait Time = 18

* `0010`: $$9 - 3 =6$$
* `0100`: $$9 - 3 =6$$
* `0110`: $$9 - 2 =7$$
* `1001`: $$9 - 2 =7$$
* `1011`: $$9 - 1 =8$$
* `1101`: $$9 - 1 =8$$

#### Group 4: Wait Time = 20

* `0101`: $$10 - 2 =8$$
* `1010`: $$10 - 2 =8$$

## Exercise 8.22

Seemingly the answer follows directly from the second corollary of Theorem 8.2. But the situation is a bit more complicated, since we must allow any sort of $$2k$$ alternating 0s and 1s to appear. For example, a monkey may start with one type (010101...) and switch to another (101010...) in the middle. These events could happen in any order. Because these two strings share so many prefixes, waiting for "whichever comes first" is significantly faster.

We can solve this by translating the bits into a sequence of transitions. Let’s create a new sequence where 1 means a different key was typed than before and zero if the same key was hit. Therefore, waiting for any alternating string of $$2k$$ bits is mathematically identical to waiting for a solid block of $$2k-1$$ 1s in a random bitstring. Now, we can apply the second corollary of Theorem 8.2. Since a sequence of $$N$$ transitions requires $$N+1$$ actual keystrokes (a starting character must be there before we can transition away from it), we need to add 1 to the total. The answer is $$4^k - 1$$.

## Exercise 8.23

The book explicitly lists state 0 at front as a starting point. The state transitions are (including that extra 0 at the beginning)

{% code expandable="true" %}

```
0 0 1 2 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4
```

{% endcode %}

The following Python script prints out these transitions for the given pattern in the book.

{% code expandable="true" %}

```python
# FSA state transition table for the pattern from the book.
# The first dimension is the 0 transition and the second one is the 1 transition.
# The index of the ordered pair is the current state.
dfa = ((0, 1), (2, 1), (0, 3), (4, 1), (5, 3), (0, 6), (2, 7), (8, 1))

def search(p, text):
    print(0, end=" ")
    n = len(text)
    j = 0
    for i in range(n):
        j = dfa[j][ord(text[i]) - ord('0')]
        print(j, end=" ")
        if j == p:
            return i - p
    return n

# Change this line to search in a different text and/or alter the pattern length.
search(8, "010101010101010101010")
```

{% endcode %}

## Exercise 8.24

This is a variation of the previous exercise. The state transitions are

{% code expandable="true" %}

```
0 1 1 1 2 0 1 2 3 1 1 2 3 4 3 1 2 3 4 5 0 1 2 3 4 5 6 2 3 4 3 4 5 6 7 1 1 2 3 4 5 6 7 8
```

{% endcode %}

## Exercise 8.25

Once we reach state 2 in 2 steps, we need additional 3 steps to reach it again. Therefore, the text string of length 25 that maximizes (among all strings of length 25) the number of times the KMP automaton from Figure 8.3 reaches step 2 is

{% code expandable="true" %}

```
1001001001001001001001001
```

{% endcode %}

Notice that the solution isn’t unique.

## Exercise 8.26

<figure><img src="/files/M6hD5Y7SOtyTloAxZbIX" alt=""><figcaption></figcaption></figure>

## Exercise 8.27

The program from [Exercise 8.23](#exercise-8.23) (the DFA table and the function call arguments should be changed) generates the following output:

{% code expandable="true" %}

```
0 0 1 2 3 4 5 6 7 0 0 1 2 2 3 4 5 3 4 5 6 2 3 4 5 3 0 1 2 3 4 5 6 7 8 9 10 11 12
```

{% endcode %}

## Exercise 8.28

For the pattern `101010...` the state transition table is (listed horizontally as ordered pairs): `(0, 1), (2, 1), (0, 3), (4, 1), ... , (0, 2k-1), (2k, 1)`.

For the pattern `010101...` the state transition table is (listed horizontally as ordered pairs): `(1, 0), (1, 2), (3, 0), (1, 4), ..., (2k-1, 0), (1, 2k)`.

## Exercise 8.29

Notice that any occurrence of 0 or 2 would reset the automaton (bring it back to state 0). The pattern resembles the one from the previous exercise, so the transition table is similarly constructed (drawn vertically):

{% code expandable="true" %}

```
0 1 2 3
-------
0 0 0 1
0 2 0 1
0 0 0 3
0 4 0 1
0 0 0 5
0 6 0 1
```

{% endcode %}

The program from [Exercise 8.23](#exercise-8.23) should be altered to reflect the new DFA table and text string. It produces the following output:

{% code expandable="true" %}

```
0 0 0 1 0 0 1 0 1 2 3 0 1 0 1 2 3 4 5 6
```

{% endcode %}

## Exercise 8.30 🌟

{% hint style="success" %}
This exercise provides a direct proof that the language recognized by a deterministic FSA has an OGF that is rational.
{% endhint %}

Let's assume we’ve a DFA with $$K$$ states, numbered $$0$$ to $$K-1$$. Let state $$0$$ be the start state and $$F\_i(z)$$ be the OGF that enumerates all valid paths starting from state $$i$$ and ending in an accept state. The language recognized by the entire DFA is simply $$F\_0(z)$$.

A valid string starting from state $$i$$ is formed by taking exactly one transition (which costs one character, or $$z$$) to a neighboring state $$j$$ (could also be a self-transition), and then appending any valid string from state $$j$$. We can write this as a linear equation for every state

$$
F\_i(z) = \sum\_{j=0}^{K-1} M\_{i,j} z F\_j(z) + c\_i,
$$

where:

* $$M\_{i,j}$$ is the number of transitions from state $$i$$ to state $$j$$ (usually 0 or 1, but could be more if multiple characters lead to the same state).
* $$c\_i = 1$$ if state $$i$$ is an accept state (since the empty string $$\epsilon$$ is a valid way to stop right there), and $$c\_i = 0$$ otherwise. [Exercise 8.17](#exercise-8.17) shows how to apply this in practice.

As usual, a system of such linear equations is most easily handled in matrix form. Let $$\mathbf{F}(z)$$ be the column vector of our OGFs, $$\mathbf{c}$$ be the column vector of our accept states, and $$M$$ be the adjacency matrix of the DFA. This gives

$$
\mathbf{F}(z) = z M \mathbf{F}(z) + \mathbf{c}.
$$

The solution is

$$
\mathbf{F}(z) = (I - z M)^{-1} \mathbf{c}.
$$

[Cramer's rule](https://en.wikipedia.org/wiki/Cramer%27s_rule) for matrix inversion says that

$$
A^{-1} = \frac{\text{adj}(A)}{\det(A)} \implies \mathbf{F}(z) = \frac{\text{adj}(I - z M)}{\det(I - z M)} \mathbf{c}.
$$

The matrix $$(I - z M)$$ contains only linear polynomials. The determinant of a matrix of polynomials is calculated purely by adding and multiplying those polynomials. Therefore, the denominator is guaranteed to be a polynomial in $$z$$. Finally, the adjugate matrix is formed by calculating the determinants of sub-matrices. Thus, every entry in the numerator is also guaranteed to be a polynomial in $$z$$.

Because $$F\_0(z)$$ (the OGF of our language) is calculated strictly by dividing a polynomial numerator by a polynomial denominator, it is by definition a rational function. This concludes the proof.

## Exercise 8.31

The following Python script implements the solver for the model presented in the previous exercise. To showcase that it works properly, it’s setup to reproduce the OGF from [Exercise 8.17](#exercise-8.17).

{% code expandable="true" %}

```python
import sympy as sp

def compute_dfa_ogf(m, c):
    z = sp.symbols('z')
    mm = sp.Matrix(m)
    cm = sp.Matrix(c)
    im = sp.eye(len(c))
    am = im - z * mm
    ans = am.LUsolve(cm)
    f = sp.cancel(ans[0])
    return f

if __name__ == "__main__":
    # Set up the matrix and column vector to reflect Exercise 8.17.
    m = [
        [1, 1, 0, 0, 0, 0],
        [0, 1, 1, 0, 0, 0],
        [1, 0, 0, 1, 0, 0],
        [0, 0, 1, 0, 1, 0],
        [0, 1, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 0]
    ]
    c = [1, 1, 1, 1, 1, 1]
    f = compute_dfa_ogf(m, c)
    print(f"F(z) = ({sp.numer(f)}) / ({sp.denom(f)})")
```

{% endcode %}

It outputs exactly the same OGF as formulated in Theorem 8.3, although not in the most simplified form.

{% code expandable="true" %}

```
F(z) = (-z**5 - z**3 - 1) / (z**6 - z**5 + 2*z**4 - z**3 + 2*z - 1)
```

{% endcode %}

## Exercise 8.32

The number of bitstrings of length $$2N$$ with equal number of 0s and 1s is given in the book

$$
\[z^{2N}]S(z)=\binom{2N}{N}.
$$

We just need to count all possible even length prefixes (including an empty prefix) with equal number of 0s and 1s and divide the result by the total number of bitstrings of length $$N$$. This gives

$$
\text{Average} = \frac{1}{2^N} \sum\_{i=0}^{\lfloor N/2 \rfloor} \binom{2i}{i} 2^{N-2i}= \sum\_{i=0}^{\lfloor N/2 \rfloor} \frac{\binom{2i}{i}}{2^{2i}}.
$$

{% hint style="info" %}
[Exercise 8.34](#exercise-8.34) demonstrates another path to a solution via linearity of expectation.\
\
In the context of 1D random walks (which this problem directly maps to, where 0 is a step left and 1 is a step right), the derived formula calculates the expected number of returns to the origin!
{% endhint %}

## Exercise 8.33

This directly follows from [Exercise 6.10](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/f93B9hYr4j4XMcDmq8le#exercise-6.10), where $$N$$ denotes the length of a random bitstring

$$
\text{Probability}=\frac{\binom{N}{\lfloor N/2 \rfloor}}{2^N}.
$$

## Exercise 8.34

The first question can be answered by employing linearity of expectation. Let $$X\_i$$ be an indicator random variable that a prefix of length $$2i+k$$ has $$k$$ more 0s than 1s. Therefore,

$$
\Pr{X\_i=1}=\frac{\binom{2i+k}{i}}{2^{2i+k}} \implies \text{Average} = \sum\_{i=0}^{\lfloor (N-k)/2 \rfloor} \frac{\binom{2i+k}{i}}{2^{2i+k}}.
$$

The answer to the second question follows from generalizing an approach from [Exercise 6.10](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/f93B9hYr4j4XMcDmq8le#exercise-6.10), where  $$k$$ was fixed at 1. So, by renaming variables, adjusting the criteria for the lower bound, and setting the touching line to $$y=-k$$, we get

$$
\text{Probability} = \frac{1}{2^N} \sum\_{i = \lceil (N-k)/2 \rceil}^{N} \left\[ \binom{N}{i} - \binom{N}{i+k} \right]= \frac{1}{2^N}\sum\_{i = \lceil \frac{N-k}{2} \rceil}^{\lfloor \frac{N+k-1}{2} \rfloor} \binom{N}{i}.
$$

## Exercise 8.35 🌟

{% hint style="success" %}
This exercise expands on the topic about CFGs to illuminate some limitations and shows how to tackle more complicated constraints by using a deterministic FSA.
{% endhint %}

A stack with a fixed capacity of $$M$$ is mathematically identical to a deterministic FSA with exactly $$M+1$$ states. A push is a transition from state $$i$$ to state $$i+1$$. A pop is a transition from state $$i$$ to state $$i-1$$. If we’re in state 0 and read a pop, it's a trap (reject). If we’re in state $$M$$ and read a push, it's a trap (reject). Push and pop operations may be encoded as 1 and 0, or vice versa. Therefore, the problem transforms into tracking random bitstrings of length $$N$$, where the number of 0s and 1s remain bounded.

We cannot use a ballot like CFG, since it’s impossible to specify an upper limit. It could only model a stack of infinite capacity. Let’s see what do we get if we plug an adjacency matrix for $$M=3$$ into the program from [Exercise 8.31](#exercise-8.31). The new input is

{% code expandable="true" %}

```
# Set up the matrix and column vector to reflect a stack with M=3.
m = [
    [0, 1, 0, 0],
    [1, 0, 1, 0],
    [0, 1, 0, 1],
    [0, 0, 1, 0]
]
c = [1, 1, 1, 1]
```

{% endcode %}

It outputs the OGF for Fibonacci numbers

$$
F(z)=\frac{1}{1-z-z^2} \implies \[z^N]F(z)=F\_{N+1}=O(\phi^N).
$$

Let’s try $$M=4$$ with the following setup

{% code expandable="true" %}

```
# Set up the matrix and column vector to reflect a stack with M=4.
m = [
    [0, 1, 0, 0, 0],
    [1, 0, 1, 0, 0],
    [0, 1, 0, 1, 0],
    [0, 0, 1, 0, 1],
    [0, 0, 0, 1, 0]
]
c = [1, 1, 1, 1, 1]
```

{% endcode %}

The program gives

$$
F(z) = \frac{1+z-z^2}{1-3z^2} \implies \[z^N]F(z)=O((\sqrt 3)^N).
$$

It turns out that the growth rate for the total number of legal sequences is dictated by the maximum eigenvalue of the adjacency matrix. At any rate, to get the desired probability we must divide this amount by $$2^N$$.

## Exercise 8.36 🌟

{% hint style="success" %}
This exercise presents a case study in using CFGs for enumeration problems.
{% endhint %}

Let our operations be:

* $$I$$: Insert (1 choice)
* $$R$$: Remove (2 choices: $$R\_1$$ or $$R\_2$$)

We can reuse the idea from Figure 8.2 in the book by placing any valid sequence into an envelope of one insert and remove operations. This gives our CFG

$$
\text{<}S \text{>}:= \epsilon \mid I , \text{<}S\text{>} , R , \text{<}S\text{>}\implies S(z) = 1 + 2z^2 S(z)^2.
$$

The initial condition is $$S(0)=1$$, so

$$
S(z) = \frac{1 - \sqrt{1 - 8z^2}}{4z^2}.
$$

If we use a substitution $$u = 2z^2$$, then it immediately transforms into the familiar OGF for Catalan numbers. Thus,

$$
S(z) = T(2z^2) = \sum\_{n\ge0} T\_n (2z^2)^n = \sum\_{n\ge0} T\_n 2^n z^{2n}.
$$

The number of legal sequences of length $$N = 2n$$ is exactly $$2^n T\_n$$. The number of all possible sequences is $$3^N=9^n$$ (recall that we have 3 operations). Therefore,

$$
\text{Probability} = \frac{2^n T\_n}{9^n} = \frac{1}{n+1} \binom{2n}{n} \left(\frac{2}{9}\right)^n.
$$

## Exercise 8.37

We build upon the previous exercise. A new nonterminal is `<T>` which designates a potentially traced (inspected) sequence. The Inspect (Watch) operation is denoted by `W`. This gives

$$
\text{<}S \text{>}:= \epsilon \mid I , \text{<}T\text{>} , R , \text{<}S\text{>} \\
\text{<}T\text{>}:= \epsilon \mid W , \text{<}T\text{>} \mid I , \text{<}T\text{>} , R , \text{<}T\text{>}.
$$

The grammar results in the following system of equations:

1. $$S(z) = 1 + z^2 T(z) S(z)$$
2. $$T(z) = 1 + z T(z) + z^2 T(z)^2$$

We can immediately solve the second equation

$$
T(z) =  \frac{1 - z - \sqrt{1 - 2z - 3z^2}}{2z^2}.
$$

It’s a left shifted OGF for Motzkin numbers (see Chapter 6 of the book about $$t$$-restricted trees). Thus,

$$
S(z) = \frac{2}{1 + z + \sqrt{1 - 2z - 3z^2}}.
$$

The probability that a random sequence of operations of length $$N$$ is legal is

$$
\text{Probability} = \frac{\[z^N]S(z)}{3^N} \sim \frac{3\sqrt{3}}{8\sqrt{\pi N^3}}.
$$

{% hint style="info" %}
The asymptotic estimation is the result of applying the universal transfer theorem of analytic combinatorics, as hinted in the book.
{% endhint %}

## Exercise 8.38

Observe that the number of legal balanced sequences matches the enumeration of general Catalan trees. This bijection follows from the parenthesis system representation of those trees. Therefore, we can find the total probability of success by evaluating the $$G(z)$$ at $$z=1/2$$. This gives

$$
G(z) = \frac{1 - \sqrt{1 - 4z^2}}{2} \implies G(1/2)=1/2.
$$

This is remarkable, that a monkey has only a 50% chance of succeeding. Nonetheless, the expected number of keystrokes is even more dizzying. We have

$$
G'(z) = \frac{d}{dz} \left( \frac{1 - \sqrt{1 - 4z^2}}{2} \right) = \frac{2z}{\sqrt{1 - 4z^2}}.
$$

The average number of characters typed before the monkey hits upon a legal balanced sequence is

$$
zG'(z) \Big|\_{z=1/2} = \frac{2(1/2)^2}{\sqrt{1 - 4(1/2)^2}} = \infty.
$$

{% hint style="info" %}
More precisely, the average length of a balanced legal sequence tends to infinity, thus the number of keystrokes tends to infinity, as well.
{% endhint %}

## Exercise 8.39 🌟

{% hint style="success" %}
This exercise leverages the concept of [martingales](https://www.jeremykun.com/2014/03/03/martingales-and-the-optional-stopping-theorem/) for computing the expected number of keystrokes.
{% endhint %}

The experiment terminates the exact moment the monkey types a palindrome of 10 or 11 characters long. We’ll spawn two gamblers at every step $$t$$:

* Gambler A brings $1 and bets that the 10 characters from $$t$$ to $$t+9$$ will form a length 10 palindrome.
* Gambler B brings $1 and bets that the 11 characters from $$t$$ to $$t+10$$ will form a length 11 palindrome.

To form a length 10 palindrome, 5 independent pairs of characters must match. Gambler A waits for the first 5 characters to be typed, and then bets their fortune on the 6th character matching the 5th, the 7th matching the 4th, etc. Every time they are right, their money multiplies by 26. Gambler B also requires exactly 5 pairs to match (the center character is free). At any rate, the maximum winning amount is $$26^5$$.

Since we "work" with 2 gamblers at a time, we have

$$
2 \mathbb{E}\[N] = \mathbb{E}\[\text{Total Payout}],
$$

where $$N$$ is a random variable denoting the stopping time. When the game stops at time $$N$$, it stops because either a length 10 or length 11 palindrome was just reached.

If it stops on a length 10 palindrome, then Gambler A who arrived 10 steps ago wins the grand prize of $$26^5$$. If it stops on a length 11 palindrome, then Gambler B who arrived 11 steps ago wins the grand prize of $$26^5$$. Gamblers who arrived very recently, and haven't finished their 10 or 11 characters yet, are "unresolved" with expected earning of $1. There are $$9+10=19$$ such gamblers from both groups. The total payout is $$$(26^5 + 19)$$.

{% hint style="info" %}
The cited post about martingales uses a concrete palindrome `ABRACADABRA`. It clearly has nontrivial autocorrelation, so a gambler who came 4 steps before the end earns $$$26^4$$. Again, in that setup gamblers must guess the exact palindrome, hence they always start with `A`. In our case, we don’t have this. Knowing that a string is a palindrome only tells us that the left side mirrors the right side. It tells nothing about whether the letters inside the left side match each other. Because the "free" letters of the palindrome are perfectly random, any overlapping gambler who gets far enough to start placing bets is effectively just betting that two randomly chosen letters will match. Since all bets are fair, their expected wealth is exactly the $$$(1/26 \times 26) + (25/26 \times 0) = $1$$ they walked in with! Of course, those who had no chance to bet will keep their money, so we also include them in the payout.
{% endhint %}

Because the total expected payout is unconditionally the same regardless of whether the monkey triggers the length 10 or length 11 condition, we can solve directly for the expected number of keystrokes

$$
2 \mathbb{E}\[N] = 26^5 + 19 \implies \boxed{\mathbb{E}\[N] = 5,940,697.5}.
$$

{% hint style="info" %}
Another approach is to use the method described in the paper [String overlaps, pattern matching, and nontransitive games](https://www.sciencedirect.com/science/article/pii/0097316581900054) and build autocorrelation polynomials for palindromes of length 10 and 11. Interestingly, the coefficients of these polynomials are exactly the payouts for gamblers from each group. The constant term is $$26^5$$, whilst all other coefficients are 1. They only differ in length. The martingales method essentially explains the rationale behind those OGFs mentioned in the book and this cited paper.
{% endhint %}

The methodology laid out above easily generalizes to any $$k \ge 10$$.

## Exercise 8.40 🌟

{% hint style="success" %}
This exercise showcases additional techniques pertaining to the application of CFGs in analysis.
{% endhint %}

Since spaces can appear anywhere in a regular expression they aren’t essential for evaluating the validity of a sequence. The monkey is typing "essential" keys with probability 31/32, and "noise" (spaces) with probability 1/32. By Wald's Lemma (see [Exercise 8.21](#exercise-8.21)), we can completely remove the space key from the board. We simply calculate everything in terms of a 31-key keyboard and then multiply the result by 32/31, the expected number of total keystrokes required to produce one essential keystroke, to account for the spaces typed in between.

Let's build the unambiguous CFG for a standard regular subexpression $$E$$ over the 30 essential keys (excluding space and period). Every binary operation brings its own parentheses to guarantee unambiguous parsing, while the unary star operator does not need them. We’ve

$$
\begin{align\*}
\text{<}E\text{>} &:= (\text{<}E\text{>}\text{<}E\text{>}) \\
\text{<}E\text{>} &:= (\text{<}E\text{>}'+'\text{<}E\text{>}) \\
\text{<}E\text{>} &= \text{<}E\text{>}'*' \\
\text{<}E\text{>} &:= \text{any of the 26 letters} \\
\end{align*}
$$

{% hint style="info" %}
It’s possible to craft a bit different grammar, but the methodology remains the same.
{% endhint %}

The final language becomes $$\text{<}L\text{>} := (\text{<}E\text{>}).$$ (notice the period at the end). Theorem 8.6 gives

$$
E(z) = z^2 E(z)^2+ z^3 E(z)^2 + z Y(z)+26z \\
L(z) = z^3E(z).
$$

Because the regular expression is enclosed in strict parentheses and ends with a unique period that never appears anywhere else, the language is prefix-free, suffix-free, and internally non-overlapping. Therefore, we can regard every keystroke as a Bernoulli trial, so the expected waiting time is simply the reciprocal of the total probability mass of $$L$$ (expectation of the geometric distribution). We’ve

$$
E(1/31) = \frac{26}{31} + \frac{1}{31^3} E(1/31)^2 + \frac{1}{31^2} E(1/31)^2 + \frac{1}{31} E(1/31) \implies E(1/31) = \frac{14415 - \sqrt{206992673}}{32} \\\[0.4cm]
L(1/31)= \left(\frac{1}{31}\right)^3 E(1/31) = \frac{E(1/31)}{29791}.
$$

This gives our waiting time (the expected number of characters typed before the monkey hits upon a legal regular expression)

$$
W = \frac{32}{31} \left( \frac{29791}{E(1/31)} \right) = \frac{16}{13} \left( 14415 + \sqrt{206992673} \right) \approx 35,448.9.
$$

## Exercise 8.41

<figure><img src="/files/FTyms1FfSCBQWGxSgUKR" alt=""><figcaption><p>The reversed bitstrings no longer maintain prefix-free properties. Prefixes that occur within longer bitstrings are depicted as red internal nodes. For instance, in the trie shown on the left, the prefix <code>000</code> is contained within the bitstring <code>00011</code>. Finally, nonvoid external nodes are shown with full bitstrings for clarity, although remaining suffixes could work, too.</p></figcaption></figure>

{% hint style="info" %}
When altering a data structure to accommodate additional cases, like handling bitstrings that aren’t anymore prefix-free, the resulting structure may conflict with the original definition. In this case, we do allow void and nonvoid external nodes to be siblings for red internal nodes.
{% endhint %}

## Exercise 8.42

<figure><img src="/files/hPYIGjQFfmtVaA98NgIS" alt=""><figcaption><p>The left trie accommodates 8 sets of bitstrings, whilst the right trie only 2 sets. Of course, there are may symmetrical cases, these aren’t depicted separately. At any rate, all of them share the same capacity with these basic configurations.</p></figcaption></figure>

## Exercise 8.43

Assume that the same structure means the same arrangement of internal nodes. This entails that leaves must be associated with 2 nonvoid external nodes. We can vary how the remaining void external nodes may turn into nonvoid ones.

The leftmost trie has $$\sum\_{k=0}^4 \binom{4}{k}=2^4=16$$ configurations. The middle has $$2^8=256$$ configurations. The rightmost trie is full, so the given configuration is unique.

## Exercise 8.44 🌟

{% hint style="success" %}
This exercise expands the book by providing a full derivation of the OGF that enumerates tries with $$N$$ external nodes.
{% endhint %}

For any given trie $$t$$ with $$N$$ external nodes, we can have $$2^{N-2\text{leaves}(t)}$$ configurations, where $$\text{leaves}(t)$$ counts the number of leaves in $$t$$ (see also the previous exercise). To generalize the enumeration for any $$N$$ (in the spirit of Figure 8.6 in the book) we should derive the corresponding BGF.

The book already defines $$T(z, u)$$ (see Chapter 5 or the book’s [website](https://aofa.cs.princeton.edu/50ac/)) such that $$z$$ tracks internal nodes and $$u$$ tracks leaves. For a tree with $$N$$ external nodes, there are exactly $$N-1$$ internal nodes. Our combinatorial rule states that for a given tree with $$N$$ external nodes and $$k$$ leaves, the number of configurations is $$2^{N - 2k} = 2^N (1/4)^k$$. Thus, we can find the total number of trie configurations by evaluating $$T(z, u)$$ at $$u = 1/4$$ and multiplying the $$z^{N-1}$$ coefficient by $$2^N$$. Let's define a raw generating function $$Y(z)$$ that applies this transformation

$$
Y(z) = 2z \cdot T(2z, 1/4).
$$

We need to check the boundary condition for $$N=1$$. The empty tree in $$T(z,u)$$ is represented by 1. Thus,

$$
Y(z) = 2z(1 + \dots) = 2z + z^2 + 4z^3 + 17z^4 \dots \implies \[z^1]Y(z)=2.
$$

It turns out, that it reports 2 tries, whereas from Figure 8.6 we have $$X\_1=1$$. The discrepancy emanates from $$Y(z)$$ being discriminatory regarding void and nonvoid external nodes. To perfectly match $$X(z)$$ to Figure 8.6, we simply subtract that extra empty-set configuration

$$
X(z) = Y(z) - z = 2z \cdot T(2z, 1/4) - z.
$$

Now, we’re left with truly mechanical work to find the closed-form equation. We can execute the algebra starting directly from the book's equation

$$
T(z, u) + z = 1 + zu + zT(z, u)^2.
$$

After applying substitutions, multiplying by the previously mentioned factor, solving the quadratics for $$Y(z)$$, and finally subtracting $$z$$, we get (the algebraic steps are omitted to keep the exposition tidy)

$$
\boxed{X(z) = \frac{1 - 2z - \sqrt{1 - 8z + 12z^2}}{2}}.
$$

Using WolframAlpha we can expand the OGF to see the first few terms

$$
X(z)=z + z^2 + 4 z^3 + 17 z^4 + 76 z^5 + 354 z^6 + 1704 z^7 + O(z^8).
$$

It perfectly matches Figure 8.6 and, in principle, we can find any coefficient. We can also find the growth rate of coefficients by transforming $$X(z)$$

$$
\begin{align\*}
X(z) &= \frac{1}{2} - z - \frac{1}{2}\sqrt{1 - 8z + 12z^2} \\
&\sim -\frac{1}{2}\sqrt{1 - 8z + 12z^2} \\
&= -\frac{1}{2} \sqrt{(1 - 2z)(1 - 6z)} \\
&= \frac{-\frac{1}{2} \sqrt{1 - 2z}}{(1 - z/(1/6))^{-1/2}}.
\end{align\*}
$$

Now, we can directly apply the corollary of Theorem 5.5 (see the book's [website](https://aofa.cs.princeton.edu/50ac/)) with $$\rho=1/6$$ and $$\alpha=-1/2$$ to get

$$
\[z^N]X(z) \sim  \frac{1}{4} \sqrt{\frac{2}{3\pi}} \frac{6^N}{N^{3/2}}.
$$

## Exercise 8.45 🌟

{% hint style="success" %}
This exercise continues expanding the book's content to find the proportion of the external nodes that are void in a “random” trie (assuming each different trie structure to be equally likely to occur).
{% endhint %}

Essentially, this is a variation of the previous exercise. To track the void nodes, we want a BGF $$X(z, v)$$ where $$z$$ marks all external nodes and $$v$$ marks only the void ones. In a trie with $$N$$ external nodes and $$k$$ leaves, $$N-2k$$ external nodes are toggleable. Each can be nonvoid (weight 1) or void (weight $$v$$), contributing a factor of $$(1+v)$$. In the previous exercise this was 2, since we didn’t care to further classify them. The polynomial tracking the configurations of this specific shape is

$$
z^N(1+v)^{N-2k}=z(1+v) \cdot \[z(1+v)]^{N-1} \cdot \left\[ \frac{1}{(1+v)^2} \right]^k.
$$

{% hint style="info" %}
Factoring out the term raised to $$N-1$$ power is necessary to reuse $$T(z,u)$$ from the book, which tracks internal nodes under $$z$$.
{% endhint %}

Now, the derivation virtually follows the steps from the previous exercise. We have

$$
Y(z,v) = z(1+v) \cdot T\left(z(1+v), \frac{1}{(1+v)^2}\right) \\\[0.3cm]
X(z, v)=Y(z,v)-zv.
$$

The boundary condition here requires us to subtract $$zv$$, since an empty trie is a single nonvoid external node. Solving the equation gives

$$
X(z, v) = \frac{1 - \sqrt{1 - 4z(1+v) - 4z^2 + 4z^2(1+v)^2}}{2} - zv.
$$

Let $$C(z)$$ be the CGF for the total number of void nodes across all tries of size $$N$$ (see Table 3.5 in the book)

$$
C(z) = \left. \frac{\partial X(z, v)}{\partial v} \right|\_{v=1}= \frac{z - 4z^2}{\sqrt{1 - 8z + 12z^2}} - z.
$$

The expected proportion of void nodes in a random trie of size $$N$$ is

$$
\boxed{P\_N = \frac{\[z^N] C(z)}{N \cdot \[z^N]X(z, 1)} \sim \frac{1}{3}}.
$$

The above estimate follows by employing the corollary of Theorem 5.5 on $$C(z)$$

$$
\[z^N]C(z) \sim \frac{\sqrt{3}}{18\sqrt{2\pi}} \frac{6^N}{N^{1/2}}.
$$

We already know from before the estimate of $$X(z,1)$$.

## Exercise 8.46

Let $$N$$ be the number of strings in the set (which equals the number of nonvoid external nodes).\
Let $$I$$ be the number of distinct prefixes (including the empty prefix $$\epsilon$$) shared by at least two strings in the set (which equals the total number of internal nodes, because every shared prefix forces a split). We also know that in any binary tree (including tries) the number of external nodes is $$I+1$$. Hence, the number of void external nodes is simply $$I+1-N$$. Consequently, we have a void external node if this amount is positive.

{% hint style="info" %}
Testing this formula on the set of strings associated with the tries from Figure 8.5 in the book perfectly gives back the number of external void nodes. For computing the formula without building a trie, we can leverage the [longest common prefix](https://leetcode.com/problems/longest-common-prefix/) for strings. Namely, we would register all LCPs between adjacent pair of strings in a previously sorted list. The cardinality of the set of these LCPs gives back the value for $$I$$.
{% endhint %}

## Exercise 8.47

If we use 1-based indexing then the last position to be examined is 29, since both patterns are 10 bits long. To compute the total number of bits examined, it’s best to group distinct prefixes that occur at all possible positions and handle them separately. For example, a single 0 entails 1 bit comparison to hit a void external node. 11 needs 2 bit comparisons, since both patterns start with 10 (the mismatch happens at the second bit). The total sum is 73 bit comparisons.

## Exercise 8.48

The basic variant revolves around searching for multiple patterns in a text using a trie. We first build a trie from the set of pattern strings. Set the counter to zero. For every valid position in the text perform a search and increase the counter whenever a nonvoid external node is encountered.

{% hint style="info" %}
The previous algorithm assumes a set of prefix-free patterns. Otherwise, the algorithm just needs a slight tweak to increment the counter if the search passes through a valid prefix marker (see  [Exercise 8.41](#exercise-8.41)) or terminates at a nonvoid external node.
{% endhint %}

The improved version would use an altered Aho-Corasick FSA, where we simply need to treat the accepting states exactly like the internal states. Every visit to the corresponding accept state would also increment the counter. Figure 8.8 in the book shows a halting machine.&#x20;

## Exercise 8.49

<figure><img src="/files/WQOnpM7oQySsdOpTIx6p" alt=""><figcaption><p>A nonvoid external node contains the position of the corresponding suffix in the text using 0-based indexing. Due to search patterns of at least 8-bits long, all suffixes turn out to be prefix-free in the given text. Furthermore, the last valid position is 30. The resulting trie clearly highlights the problems with the basic structure and tells why Patricia tries are important in practice.</p></figcaption></figure>

## Exercise 8.50

<figure><img src="/files/Voln1wFOqpfPdXvE8vMj" alt=""><figcaption><p>Each trie is labeled with the four-bitstring whose suffixes it stores. Here, we have overlapping prefixes, so we use the technique from <a href="#exercise-8.41">Exercise 8.41</a>.</p></figcaption></figure>

## Exercise 8.51 🌟

{% hint style="success" %}
This exercise introduces the concept of an output link in the Aho-Corasick FSA for handling overlapping prefixes.
{% endhint %}

<figure><img src="/files/ip8iIEfincZnsYZHmIOM" alt=""><figcaption><p>The pattern 01 appears as a prefix of 010. To properly model this using the Aho-Corasick FSA we need sort of an output link, which in this case is modeled with an <span class="math">\epsilon</span> transition of an NFA.</p></figcaption></figure>

## Exercise 8.52 🌟

{% hint style="success" %}
This exercise provides a detailed derivation of the mean number of internal nodes in a trie corresponding to $$N$$ random bitstrings.
{% endhint %}

We proceed as in the proof of the corollary of Theorem 4.10 in the book. The recurrence can be simplified to

$$
A\_N=1+\frac{2}{2^N}\sum\_k \binom{N}{k}A\_k \qquad \text{for $N>1$ with $A\_0=A\_1=0$}.
$$

Multiplying by $$z^N/N!$$ and summing on $$N$$ leaves a straightforward convolution where the EGF $$\sum\_{N≥0}A\_Nz^N/N!$$ must satisfy the functional equation

$$
A(z)=e^z-1-z+2e^{z/2}A(z/2).
$$

Iterating the equation, we find that

$$
\begin{align\*}
A(z) &= e^z-1-z+2e^{z/2}A(z/2) \\
&= e^z-1-z + 2e^{z/2}(e^{z/2}-1-z/2 + 2e^{z/4}A(z/4)) \\
&= e^z -1-z+ 2e^z -2e^{z/2}-ze^{z/2}+ 4e^{3z/4}A(z/4) \\
&= e^z-1-z+2e^z -2e^{z/2}-ze^{z/2}+4e^z-4e^{3z/4}-ze^{3z/4}+8e^{7z/8}A(z/8) \\
&= \dots \\
&=\sum\_{j \ge 0} \left( 2^j e^z - 2^j e^{z(1 - 2^{-j})} - z e^{z(1 - 2^{-j})} \right).
\end{align\*}
$$

Using Table 3.4 from the book, we get

$$
A\_N = \sum\_{j \ge 0} \left( 2^j - 2^j(1 - 2^{-j})^N - N(1 - 2^{-j})^{N-1} \right).
$$

Now, we proceed exactly as in the proof of Theorem 8.8 by first approximating the sum

$$
\frac{A\_N}{N} \sim \sum\_{j \ge 0} \left( \frac{2^j}{N}(1 - e^{-N/2^j}) - e^{-N/2^j} \right).
$$

We split this sum at $$j = \lfloor \lg N \rfloor$$. Let's call this boundary $$L = \lfloor \lg N \rfloor$$. By shifting the index of summation below $$j \to j + L$$, the bounds of our two sums become $$-L \le j < 0$$ and $$j \ge 0$$.

$$
\begin{align\*}
\frac{A\_N}{N} &\sim \sum\_{0 \le j < L} \left( \frac{2^j}{N}(1 - e^{-N/2^j}) - e^{-N/2^j} \right) + \sum\_{j \ge L} \left( \frac{2^j}{N}(1 - e^{-N/2^j}) - e^{-N/2^j} \right) \\\[0.5cm]
&=  \sum\_{-L \le j < 0} \left( 2^{j-{\lg N}} (1 - e^{-2^{{\lg N}-j}}) - e^{-2^{{\lg N}-j}} \right) + \sum\_{j \ge 0} \left( 2^{j-{\lg N}} (1 - e^{-2^{{\lg N}-j}}) - e^{-2^{{\lg N}-j}} \right)  \\\[0.5cm]
&=  \sum\_{j = -\infty}^\infty \left( 2^{j-{\lg N}} (1 - e^{-2^{{\lg N}-j}}) - e^{-2^{{\lg N}-j}} \right).
\end{align\*}
$$

{% hint style="info" %}
Because $$L = \lfloor \lg N \rfloor$$ as $$N \to \infty$$, the boundary $$L$$ grows infinitely large. The terms added by extending the lower bound from $$-L$$ to $$-\infty$$ are exponentially small and decay so rapidly that their sum introduces a negligible error of $$o(1)$$. Therefore, we can safely complete the tail to form a single, doubly-infinite sum.
{% endhint %}

In the derivation, we’ve used the following equations:

$$
L = \lg N - {\lg N} \\\[0.3cm]
\frac{N}{2^j} = \frac{N}{2^{\lfloor \lg N \rfloor + k}} = 2^{\lg N - \lfloor \lg N \rfloor - k} = 2^{{\lg N} - k} \\\[0.3cm]
\frac{2^j}{N} = 2^{k - {\lg N}}.
$$

Our sum is essentially of the form $$P(N)=\sum\_{j=-\infty}^{\infty} Q({\lg N}-j)$$, where

$$
Q(x) = 2^{-x}(1 - e^{-2^x}) - e^{-2^x}.
$$

Observe that $$P(N)$$ is a periodic function, since $$P(2N)=P(N)$$. By the laws of Fourier analysis, any periodic function can be represented as its mean value plus a fluctuating wave. This gives

$$
\begin{align\*}
\mu &= \int\_0^1 \sum\_{j=-\infty}^{\infty} Q(y-j) dy \\
&= \sum\_{j=-\infty}^{\infty} \int\_{-j}^{1-j} Q(x) dx && \text{(the sum converges absolutely)} \\
&= \int\_{-\infty}^{\infty} Q(x) dx \\
&= \frac{1}{\ln 2} \int\_0^\infty \left( \frac{1 - e^{-u}}{u^2} - \frac{e^{-u}}{u} \right) du \\
&= \frac{1}{\ln 2}.
\end{align\*}
$$

We’ve used the substitution $$u = 2^x$$, so $$du = 2^x \ln 2 \cdot dx$$. The integral can be evaluated in WolframAlpha resulting in 1. Therefore,

$$
\frac{A\_N}{N} \sim \frac{1}{\ln 2} + \hat{\epsilon}(N).
$$

## Exercise 8.53

The next Python script computes the values of $$Q(x)$$ using Numpy and produces the plot shown below.

{% code expandable="true" %}

```python
import numpy as np
import matplotlib.pyplot as plt

def q(x):
    total = 0.0
    # Summing j from -50 to 50 is mathematically sufficient to reach
    # the 64-bit floating point precision limits for this converging series.
    for j in range(-50, 50):
        u = x / (2.0 ** j)
        # Handle asymptotic bounds to prevent overflow/underflow errors
        if u > 500:
            term = 1.0 / u  # e^(-u) becomes essentially 0
        elif u < 1e-9:
            term = u / 2.0  # Taylor expansion for u -> 0
        else:
            term = (1.0 - np.exp(-u)) / u - np.exp(-u)
        total += term
    return total - (1.0 / np.log(2.0))

# Explore fluctuations by producing a graph.
x_vals = np.linspace(0.5, 280, 20000)
y_vals = [q(x) for x in x_vals]

fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(x_vals, y_vals, color='#444444', linewidth=1.2)

# Set appearance as in Figure 8.10.
ax.set_xlim(0, 280)
ax.set_ylim(-1.6e-6, 1.6e-6)
ax.set_xticks([0, 32, 64, 128, 256])
ax.set_yticks([-1.56e-6, 0, 1.56e-6])
ax.set_yticklabels(['$-0.00000156$', '0', '$0.00000156$'])
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.title("Periodic Fluctuation in Trie Internal Nodes", pad=20)
plt.tight_layout()
plt.show()
```

{% endcode %}

<figure><img src="/files/sPixGgiQqGgSp5SjF363" alt=""><figcaption></figcaption></figure>

## Exercise 8.54 🌟

{% hint style="success" %}
This exercise introduces the Poisson transform to simplify analytic derivations. See also [Exercise 8.57](#exercise-8.57) for a detailed overview.
{% endhint %}

We start by establishing a new functional equation

$$
\begin{align\*}
e^{-z}C(z) &= z e^{-z} e^z - z e^{-z} + 2e^{-z}e^{z/2}C(z/2) \\
&= z - z e^{-z} + 2e^{-z/2}C(z/2) \\
&\therefore \hat{C}(z) = z - z e^{-z} + 2\hat{C}(z/2).
\end{align\*}
$$

Using Table 3.4 from the book, we get

$$
\begin{align\*}
z - z e^{-z} &= z - \sum\_{N \ge 1} N (-1)^{N-1} \frac{z^N}{N!} \\
&= - \sum\_{N \ge 2} N (-1)^{N-1} \frac{z^N}{N!} \\
&= \sum\_{N \ge 2} N (-1)^N \frac{z^N}{N!}.
\end{align\*}
$$

{% hint style="warning" %}
There is a typo in the book, we need to find $$\hat{C}\_N = N!\[z^N]\hat{C}(z)$$.
{% endhint %}

This gives

$$
\hat{C}\_N = N(-1)^N + 2 \left( \frac{1}{2^N} \right) \hat{C}\_N \implies \hat{C}\_N=  \frac{N(-1)^N}{1 - 1/2^{N-1}} \quad \text{ for $N \ge 2$ with $\hat C\_0=\hat C\_1=0$}.
$$

Apply the binomial sum operation to get

$$
C\_N = \sum\_{k=0}^N \binom{N}{k} \hat{C}*k= \sum*{2 \le k \le N} \binom{N}{k} \frac{k(-1)^k}{1 - 1/2^{k-1}}.
$$

## Exercise 8.55

We start with the expression from Theorem 8.8

$$
C\_N = N \sum\_{j \ge 0} \left( 1 - \left( 1 - \frac{1}{2^j} \right)^{N-1} \right).
$$

Using the standard binomial expansion, we have

$$
\left( 1 - \frac{1}{2^j} \right)^{N-1} = \sum\_{m=0}^{N-1} \binom{N-1}{m} \left( -\frac{1}{2^j} \right)^m= 1 + \sum\_{m=1}^{N-1} \binom{N-1}{m} (-1)^m \left( \frac{1}{2^j} \right)^m.
$$

Substitute this expansion back into the $$1 - (\dots)$$ part of our original equation. The leading 1s cancel out, so

$$
\begin{align\*}
1 - \left\[ 1 + \sum\_{m=1}^{N-1} \binom{N-1}{m} (-1)^m \left( \frac{1}{2^j} \right)^m \right] &= -\sum\_{m=1}^{N-1} \binom{N-1}{m} (-1)^m \left( \frac{1}{2^j} \right)^m \\
&= \sum\_{m=1}^{N-1} \binom{N-1}{m} (-1)^{m-1} \left( \frac{1}{2^j} \right)^m.
\end{align\*}
$$

We want our sum to look like the one given in the book, which uses an index $$k$$ running from 2 to $$N$$. Let $$k = m + 1$$ (which means $$m = k - 1$$). If $$m$$ goes from 1 to $$N-1$$, then $$k$$ goes from 2 to $$N$$. Substituting all of this back, and applying the "absorption" identify $$N \binom{N-1}{k-1} = k \binom{N}{k}$$, gives us

$$
C\_N = \sum\_{j \ge 0} \left\[ \sum\_{k=2}^N k \binom{N}{k} (-1)^k \left( \frac{1}{2^{k-1}} \right)^j \right].
$$

Because these are finite sums for any given $$N$$ (and the infinite tail converges), we can safely swap the order of summation. This gives

$$
C\_N = \sum\_{k=2}^N k \binom{N}{k} (-1)^k \left\[ \sum\_{j \ge 0} \left( \frac{1}{2^{k-1}} \right)^j \right].
$$

The inner sum is a standard, converging infinite geometric series (recall that $$k \ge 2$$)

$$
\sum\_{j \ge 0} \left( \frac{1}{2^{k-1}} \right)^j = \frac{1}{1 - 1/2^{k-1}}.
$$

Substitute this collapsed fraction back into the outer sum

$$
C\_N = \sum\_{2 \le k \le N} \binom{N}{k} \frac{k(-1)^k}{1 - 1/2^{k-1}}.
$$

## Exercise 8.56

We start with the recurrence for $$N \ge 2$$

$$
R\_N = 1 + \frac{1}{2^N} \sum\_{k=0}^N \binom{N}{k} R\_k.
$$

Multiply by $$2^N$$ to avoid divisions

$$
2^N R\_N = 2^N + \sum\_{k=0}^N \binom{N}{k} R\_k.
$$

Let $$R(z) = \sum\_{N \ge 0} R\_N \frac{z^N}{N!}$$ be the target EGF. It satifies the functional equation

$$
R(2z) =  e^{2z} - 1 - 2z+R(z)e^z.
$$

Just like [Exercise 8.54](#exercise-8.54), we apply the Poisson transform by substituting $$\hat{R}(z) \equiv e^{-z} R(z)$$. This gives

$$
\begin{align\*}
e^{-2z}R(2z) &= 1 - e^{-2z} - 2ze^{-2z}+e^{-z}R(z)  \\
&\therefore \hat{R}(2z) = 1 - (1 + 2z)e^{-2z}+ \hat{R}(z) .
\end{align\*}
$$

Substituting $$z \to z/2$$ and iterating the recurrence down to zero gives the explicit infinite sum

$$
\hat{R}(z) = \sum\_{j \ge 0} \left\[ 1 - \left( 1 + \frac{z}{2^j} \right) e^{-z/2^j} \right].
$$

We need to find $$\hat{R}\_k = k! \[z^k] \hat{R}(z)$$. Letting $$c = 1/2^j$$, we extract the $$k$$th coefficient for $$k \ge 2$$ (the constant 1 drops out)

$$
\begin{align\*}
\hat{R}*k &= \sum*{j \ge 0} \left\[ -(-c)^k - c \cdot k(-c)^{k-1} \right] \\
&= \sum\_{j \ge 0} \left\[ (-1)^{k+1} c^k + k(-1)^k c^k \right] \\
&= (-1)^k (k - 1) \sum\_{j \ge 0} \left( \frac{1}{2^k} \right)^j \\
&= \frac{(-1)^k (k - 1)}{1 - 2^{-k}}.
\end{align\*}
$$

Now, apply the EGF binomial sum operation to get

$$
R\_N = \sum\_{k=2}^N \binom{N}{k} \frac{(-1)^k (k - 1)}{1 - 2^{-k}}.
$$

To get this into a form we can approximate, we expand the denominator back into an infinite geometric series and swap the order of summation

$$
R\_N = \sum\_{j \ge 0} \left\[ \sum\_{k=2}^N \binom{N}{k} (-1)^k (k - 1) (2^{-j})^k \right].
$$

Let's evaluate that inner sum. We start with the binomial expansion

$$
g(x) = \sum\_{k=2}^N \binom{N}{k} (-1)^k x^k = (1 - x)^N - 1 + Nx.
$$

Notice that the sum we want is exactly $$x \cdot g'(x) - g(x)$$:

* $$x \cdot g'(x) = x \left\[ -N(1-x)^{N-1} + N \right] = Nx - Nx(1-x)^{N-1}$$
* Subtract $$g(x)\text{: } Nx - Nx(1-x)^{N-1} - \left\[ (1-x)^N - 1 + Nx \right]$$

This gives

$$
\sum\_{k=2}^N \binom{N}{k} (-1)^k (k - 1) x^k = 1 - (1 - x)^N - Nx(1 - x)^{N-1}.
$$

Substitute $$x = 2^{-j}$$ back into our outer sum to get the exact formula for $$R\_N$$

$$
R\_N = \sum\_{j \ge 0} \left\[ 1 - \left( 1 - \frac{1}{2^j} \right)^N - \frac{N}{2^j}\left( 1 - \frac{1}{2^j} \right)^{N-1} \right].
$$

Apply the approximation from the book

$$
R\_N \sim \sum\_{j \ge 0} \left( 1 - e^{-N/2^j} - \frac{N}{2^j} e^{-N/2^j} \right).
$$

if we split the sum at $$j = \lfloor \lg N \rfloor$$, the summation over the 1s yields exactly $$\lfloor \lg N \rfloor$$. The remaining decaying fractional tails evaluate to a fixed constant, plus the microscopic oscillating wave. Therefore,

$$
R\_N = \lg N + O(1).
$$

## Exercise 8.57

We are given the recurrence for $$N > 1$$

$$
2^N p\_N = \sum\_{k=0}^N \binom{N}{k} p\_k,
$$

with initial conditions $$p\_0 = 0$$ and $$p\_1 = 1$$.

Let $$P(z) = \sum\_{N \ge 0} p\_N \frac{z^N}{N!}$$. The RHS is the standard binomial sum $$P(z)e^z$$, whilst the LHS generates $$P(2z)$$. However, we must correct for the $$N=1$$ boundary condition. Looking at the coefficients of $$z^1$$:

* LHS: $$2^1 p\_1 = 2$$
* RHS: $$1![z^1](P\(z\)e^z) = p\_0 + p\_1 = 1$$

To balance the equation, we must add $$z$$ to the right side. Our exact functional equation is

$$
P(2z) = P(z)e^z + z.
$$

Now, apply the Poisson transform $$\hat{P}(z) \equiv e^{-z}P(z)$$ by multiplying the whole equation by $$e^{-2z}$$

$$
\begin{align\*}
e^{-2z} P(2z) &= e^{-2z} P(z)e^z + z e^{-2z} \\
\hat{P}(2z)  &= \hat{P}(z) + z e^{-2z}.
\end{align\*}
$$

Substitute $$z \to z/2$$ and iterate the equation down to 0

$$
\hat{P}(z) = \sum\_{j \ge 0} \frac{z}{2^{j+1}} e^{-z/2^j}.
$$

This gives

$$
\begin{align\*}
\hat{p}*k &= k! \[z^k] \hat{P}(z) \\
&= \sum*{j \ge 0} \frac{1}{2^{j+1}} k \left( -\frac{1}{2^j} \right)^{k-1} \\
&= \sum\_{j \ge 0} \frac{k(-1)^{k-1}}{2 \cdot 2^{j(k-1)} \cdot 2^j} \\
&= \frac{k(-1)^{k-1}}{2} \sum\_{j \ge 0} \left( \frac{1}{2^k} \right)^j \\
&= \frac{k(-1)^{k-1}}{2(1 - 2^{-k})}.
\end{align\*}
$$

Now, we apply the binomial sum operation to read out

$$
p\_N = \sum\_{k=1}^N \binom{N}{k} \frac{k(-1)^{k-1}}{2(1 - 2^{-k})}.
$$

To get this into a form we can approximate, we proceed exactly as in the previous exercise. Thus,

$$
p\_N = \frac{1}{2} \sum\_{j \ge 0} \left\[ \sum\_{k=1}^N k \binom{N}{k} (-1)^{k-1} (2^{-j})^k \right].
$$

The inner sum over $$k$$ is exactly $$x \cdot g'(x)$$ for the binomial expansion $$g(x) = (1-x)^N$$, evaluated at $$x = 2^{-j}$$

$$
\sum\_{k=1}^N k \binom{N}{k} (-1)^{k-1} x^k = Nx(1 - x)^{N-1}.
$$

Substitute $$x = 2^{-j}$$ back in to get the exact closed form

$$
p\_N = \sum\_{j \ge 0} \frac{N}{2^{j+1}} \left( 1 - \frac{1}{2^j} \right)^{N-1}.
$$

Apply the exponential approximation from the book. We also extend the boundaries in both directions, knowing that negative indices produce exponentially small quantities.

$$
p\_N \sim \frac{1}{2} \sum\_{j \ge 0} \frac{N}{2^j} e^{-N/2^j}= \frac{1}{2} \sum\_{j=-\infty}^{\infty} \left( \frac{N}{2^j} e^{-N/2^j} \right) + \mathcal{O}(e^{-N}).
$$

We are summing a function $$f(x) = x e^{-x}$$ evaluated at exponentially spaced points $$x\_j = N/2^j$$. Switching to a logarithmic scale $$y = \log\_2 x$$ creates equidistant points for applying the Riemann integral. In other words,

$$
\sum\_j f(x\_j) \sim \int\_{-\infty}^{\infty} f(y) dy = \frac{1}{\ln 2} \int\_0^\infty f(x) \frac{dx}{x}.
$$

Therefore,

$$
p\_N \sim \frac{1}{2 \ln 2} \int\_0^\infty e^{-x} dx= \frac{1}{2 \ln 2}.
$$

The microscopic $$\pm 10^{-5}$$ oscillating term $$\hat{\epsilon}(N)$$ is hiding in the difference between the discrete sum over the fractional part $${\lg N}$$ and this continuous integral mean, exactly as we proved in the internal nodes derivation.

## Exercise 8.58

The extended version of Prodinger’s algorithm is completely described in the paper [The Swedish Leader Election Protocol: Analysis and Variations](https://www.researchgate.net/publication/301282890_The_Swedish_Leader_Election_Protocol_Analysis_and_Variations). In the analysis, the authors use the Poisson transform technique that’s exemplified in several previous exercises. The new algorithm introduces an additional parameter $$\tau$$ that controls the allowed number of consecutive failed (null) rounds.

{% hint style="info" %}
The classical (fair-coin leader election approach) $$R\_N=\lg N + O(1)$$ bound naturally falls out of their formula as $$\tau \to \infty$$ and $$p=q=1/2$$.
{% endhint %}

## Exercise 8.59

{% hint style="info" %}
In the analysis below, we assume that the algorithm terminates once the run is found (early stoppage).
{% endhint %}

Let $$N$$ be the total length of the text string. The algorithm checks positions $$M, 2M, 3M, \dots, N$$. This gives us $$N/M$$ total checkpoints.

To determine the average number of bits examined by this algorithm, we can break the problem down into two distinct scenarios: what happens at a standard checkpoint in the random text, and what happens when the algorithm hits the hidden run of $$M$$ zeros.

At a typical checkpoint $$kM$$ located in the random text, the algorithm checks $$t$$ bits. Let $$W$$ be the random variable for the number of bits read in a window (initial scan). Its probability mass function is

$$
p\_W(x) = \begin{cases}
\left(\cfrac{1}{2}\right)^x &\text{if } x< t \\
\left(\cfrac{1}{2}\right)^{t-1} &\text{if } x = t
\end{cases}
$$

We’ve

$$
\mathbb{E}\[W] = \sum\_{i=1}^{t} i \cdot P(W = i)= \sum\_{i=1}^{t} P(W\ge i).
$$

We know that $$P(W \ge i) = \left(\frac{1}{2}\right)^{i-1}$$, since after successfully reading $$i-1$$ bits, we’ll surely read one more. This gives

$$
\mathbb{E}\[W] = \sum\_{i=1}^{t} \left(\frac{1}{2}\right)^{i-1}=2 - 2^{1-t}.
$$

With probability $$2^{-t}$$, all $$t$$ bits in the window are 0. When this happens, the algorithm checks bits on the left and right to determine the run length. Because the surrounding text is random, the expected number of bits checked to the left until a 1 is found is exactly 2 (geometric distribution). The expected number to the right is also 2. This adds an expected 4 bits to our check.

The total expected number of bits examined per random checkpoint is $$2 + 2^{1-t}$$.

There is exactly one checkpoint that falls inside the hidden run of $$M$$ zeros. Since the problem states that $$M \gg t$$, we can safely ignore the $$\mathcal{O}(t/M)$$ edge case where the $$t$$-bit window overlaps the boundary of the hidden run. Therefore, the $$t$$ bits checked will entirely consist of 0s.The algorithm will then expand to the left and the right. It will successfully read all the remaining zeros in the run, and it will only stop when it hits the 1 immediately to the left of the run and the 1 immediately to the right. Because it reads the entire hidden run plus exactly two bounding 1s, the total number of bits examined for this specific checkpoint is exactly $$M + 2$$.

Because the hidden run is uniformly distributed, the algorithm will, on average, check $$(N/M+1) / 2$$ checkpoints before finding it. Among these 1 is a successful search and the others are failures. This gives

$$
\text{Total Bits} = \frac{1}{2}\left(\frac{N}{M} - 1\right)(2 + 2^{1-t}) + (M + 2).
$$

## Exercise 8.60

The key is to adapt the skipping mechanic from the previous exercise by making the skip distance dynamic. Instead of searching for a fixed run of length $$M$$, we use our "longest run found so far" to dictate how far we can safely jump ahead.

Let $$L$$ be the length of the longest run of 0s found so far. We initialize it to zero. The algorithm is outlined below.&#x20;

{% stepper %}
{% step %}

### Dynamic Probing

Because we only care about finding a run strictly longer than our current record $$L$$, any new record-breaking run must contain at least $$L + 1$$ consecutive 0s. If we are starting a search from position $$i$$, we do not read bit $$i$$ (except when $$L=0$$). Instead, we skip ahead and directly probe the bit at position $$i + L$$.
{% endstep %}

{% step %}

### Bit is 1

If the bit at $$i + L$$ is a 1, it is physically impossible for a run of $$L + 1$$ zeros to start anywhere between $$i$$ and $$i + L$$, because that 1 would interrupt it! We can safely discard this entire block without reading any of the intermediate bits. We move our next position to $$i=i + L + 1$$ and repeat the probing.
{% endstep %}

{% step %}

### Bit is 0

If the bit at $$i + L$$ is a 0, a record-breaking run might exist here. Just like the previous exercise, we read backward toward $$i$$.

* If we hit a 1 while reading backward, it's a false alarm. We shift our search window past that 1 and resume probing.
* If we read all the way back to $$i$$ and they are all 0s, we have a new record! We then read forward from $$i + L + 1$$ to see exactly how long this new run is. At any rate, we also update $$L$$ to reflect this longest run of 0s as well as set our new position to continue the search after this block.
  {% endstep %}
  {% endstepper %}

We know from the book, that in a random bitstring of length $$N$$, the expected length of the longest run of zeros is roughly $$\lg N$$. By the time the algorithm is halfway through the text, $$L$$ will likely be close to $$\lg N$$. This is a significant improvement compared to a naive algorithm that maintains a simple counter and reads all bits of a string. This means our algorithm operates in $$\mathcal{O}(N / \lg N)$$ expected time.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/chapter-8-strings-and-tries.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.