> For the complete documentation index, see [llms.txt](https://evarga.gitbook.io/sh-aofa/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/chapter-6-trees.md).

# Chapter 6: Trees

## Exercise 6.1

The inductive hypothesis is the statement of the lemma. The base case is a tree with one external node and the lemma certainly holds.&#x20;

For the inductive step, assume a tree has $$i>0$$ internal nodes. We can take the left and right subtrees of the root, both of them having less than $$i$$ internal nodes. Let their numbers be $$i\_L$$ and $$i\_R$$, respectively. For these subtrees, the inductive hypothesis holds, so $$e\_L=i\_L+1$$ and $$e\_R=i\_R+1$$. Furthermore, $$e\_L+e\_R=i\_L+1+i\_R+1=i-1+2=i+1$$ as expected, since $$i\_L+i\_R=i-1$$ (the root is an internal node) and $$e\_L+e\_R=e$$. This concludes the inductive proof.

## Exercise 6.2

The total number of binary trees with $$N$$ nodes is given by the $$N$$th Catalan number $$T\_N$$. We need to find the number of binary trees with $$N$$ internal nodes having both subtrees of the root nonempty. The ratio of these numbers gives the answer to this exercise. Thus,

$$
\frac{T\_N-2T\_{N-1}}{T\_N}=\frac{N-2}{2N-1} \quad \text{ for $N>2$}.
$$

For trees with less than 3 internal nodes this ratio is zero.

## Exercise 6.3

The answer is $$T\_N^2/T\_{2N+1}$$, where $$T\_N$$ is the $$N$$th Catalan number.

## Exercise 6.4

Obviously, it’s

$$
\frac{G\_{N-1}}{G\_N}=\frac{T\_{N-2}}{T\_{N-1}}=\frac{N(N-1)}{(2N-2)(2N-3)} \qquad \text{for $N>1$}.
$$

{% hint style="info" %}
See the next exercise for a generalization of this problem.
{% endhint %}

## Exercise 6.5 🌟

{% hint style="success" %}
This exercise showcases the power of the symbolic method to answer seemingly very difficult questions.
{% endhint %}

For answering this question, we’ll switch to the symbolic method. We already know the symbolic construction for a general tree from the proof of Theorem 6.2. We want to find the trees where the root has exactly $$t$$ children. Symbolically, a tree in this class is a root node attached to a sequence of exactly length $$t$$ of $$\mathcal{G}$$

$$
\mathcal{A} = Z \times \text{SEQ}\_{=t}(\mathcal{G}) = Z \times \mathcal{G}^t \\
\therefore A(z)=zG(z)^t=z^{t+1}F(z)^t.
$$

We want the number of such trees of size $$N$$

$$
\[z^N] A(z) = \[z^{N-t-1}]F(z)^t=\frac{t}{2N-t-2}\binom{2N-t-2}{N-t-1}.
$$

The RHS immediately follows from the [Catalan $$k$$-fold convolution](https://en.wikipedia.org/wiki/Catalan_number#Catalan_k-fold_convolution). The asked proportion is

$$
\frac{\cfrac{t}{2N-t-2}\dbinom{2N-t-2}{N-t-1}}{G\_N}= \frac{t N \cdot (N-1)(N-2)\cdots(N-t)}{(2N-2)(2N-3)\cdots(2N-t-2)} \qquad \text{for $N>t$}.
$$

{% hint style="info" %}
As a quick check, when $$t=1$$ we get exactly the formula from the previous exercise.
{% endhint %}

## Exercise 6.6 🌟

{% hint style="success" %}
This exercise shows how to use Maple’s `combstruct` package for the symbolic method. It comes with other packages to support asymptotic estimation of coefficients.
{% endhint %}

To count the number of forests without singletons, we should alter a bit the combinatorial construction for the forest

$$
\mathcal{F}^+ = \text{SEQ}(\underbrace{Z \times (\mathcal{F}-E)}\_{\mathcal{G}^+}).
$$

The ordinary forest is a sequence of trees, but our restricted definition only allows trees with at least two nodes. Using the corresponding transfer theorems, we have

$$
\begin{align\*}
F^+(z) &= \frac{1}{1-z(F(z)-1)} \\\[0.3cm]
&= \frac{1}{1-z^2F(z)^2} \\\[0.3cm]
&= \frac{1}{1-G(z)^2} \\\[0.3cm]
&= \frac{1}{(1-G(z))(1+G(z))} \\\[0.3cm]
&= \frac{F(z)}{1+G(z)} \\\[0.4cm]
&= \frac{F(z)}{1+\frac{F(z)-1}{F(z)}} \qquad \text{($G(z)=zF(z)=zF^2(z)/F(z)$)} \\\[0.5cm]
&= \frac{F(z)^2}{2F(z) - 1} \\\[0.4cm]
&= \frac{F(z)^2 \cdot (z + 2)}{(2F(z) - 1) \cdot (z + 2)} \\\[0.4cm]
&= \frac{zF(z)^2 + 2F(z)^2}{(2F(z) - 1)(z + 2)} \\\[0.4cm]
&= \frac{F(z) - 1 + 2F(z)^2}{(2F(z) - 1)(z + 2)} \\\[0.4cm]
&= \frac{(2F(z) - 1)(F(z) + 1)}{(2F(z) - 1)(z + 2)} \\\[0.4cm]
&= \frac{F(z) + 1}{z + 2} \\\[0.3cm]
&\therefore \boxed{(z+2)F^+(z) = 1 + F(z)}.
\end{align\*}
$$

If we extract the coefficient of $$z^N$$ for $$N \ge 1$$, we get a recurrence relation

$$
F^+\_{N-1} + 2F^+\_N = F\_N \quad \text{ with $F^+\_0=1$}.
$$

We can solve this first-order linear recurrence using the generalized technique presented in [Exercise 2.14](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/Sy1KF5Jq9j0bEpfHwXvN#exercise-2.14). This gives our proportion (after dividing by $$F\_N=T\_N)$$

$$
P\_N = \frac{1}{T\_N} \left\[ \left(-\frac{1}{2}\right)^N + \sum\_{k=1}^N \frac{(-1)^{N-k}}{2^{N-k+1}} T\_k\right].
$$

{% code title="Maple code to dump the number of forests without singletons for N=1,2,3,4." %}

```
with(combstruct);

grammar := {
    F = Sequence(G),
    G = Prod(Z, F),
    # This is the way to specify a non-empty forest.
    F_nonempty = Sequence(G, card >= 1), 
    G_plus = Prod(Z, F_nonempty),
    F_plus = Sequence(G_plus), 
};

seq(count([F_plus, grammar, unlabeled], size = n), n = 1..4);
```

{% endcode %}

It prints 0, 1, 2, 6. Notice that 6/14=3/7, as expected.

## Exercise 6.7

{% hint style="warning" %}
The book incorrectly cites Figure 6.2; it’s therefore reasonable to infer that the intended reference was Figure 6.4.
{% endhint %}

We first group forests by size (the order of list items designate $$N$$) and then enumerate each group in top-down left-right fashion. This gives

1. `()`
2. `()()`, `(())`
3. `()()()`, `(())()`, `()(())`, `(()())`, `((()))`
4. `()()()()`, `(())()()`, `()(())()`, `()()(())`, `((()))()`, `(()())()`, `(())(())`, `()(()())`, `()((()))`, `(()()())`, `((())())`, `((()()))`, `(()(()))`, `(((())))`

## Exercise 6.8

A notable distinction between ordered trees and binary trees is that a binary tree may be empty. Accordingly, an explicit notation should be used to indicate an empty binary tree. The diagram below provides a reference to highlight the differences in their respective representations.

<div data-full-width="true"><figure><img src="/files/F4GOHyj2eHFXEKkxTSUp" alt=""><figcaption><p>Illustration of the rotation correspondence between an ordered and binary tree. External nodes are omitted on the right.</p></figcaption></figure></div>

Preorder traversal of a tree maps directly to a preorder traversal of a binary tree. Postorder traversal of a tree corresponds to an inorder (not postorder) traversal of a binary tree. There is no notion of an inorder traversal of a tree, since it’s not defined when to visit the root among multiple children.&#x20;

### Parenthesis Systems Representation

The preorder representation of a tree is

`(1 (2 (5) (6) ) (3) (4) )`

The preorder representation of a binary tree is

`(1 (2 (5) (6) ) (3) (4) )`

{% hint style="info" %}
In an arbitrary binary tree, it is necessary to append an additional closing parenthesis at the end to indicate the last external node. Since a binary tree is a positional structure, each external node (empty link) is denoted by a closing parenthesis. Consider node 5, which has node 6 as its right child; during traversal of 5's children, a closing parenthesis is placed for the left child and an opening parenthesis for the right child. However, the root of a binary tree produced through rotation correspondence from an ordered tree doesn’t possess a right child in this context.
{% endhint %}

The postorder representation of a tree is

`( ( (5) (6) 2) (3) (4) 1)`

The inorder representation of a binary tree is

`( ( (5) (6) 2) (3) (4) 1)`

The postorder representation of a binary tree is a bit awkward to be represented in this manner. As highlighted in the book, when we know additional properties of nodes (like number of arguments of an operator), then we can circumvent the usage of parenthesis in a preorder/postorder representation of any type of tree. Inorder representation must encode structural information of a binary tree, since ambiguities may arise.

### Preorder/Postorder Degree Representation

A tree can also be represented by listing the degree (number of children) of each node in a specific traversal order. In a binary tree, standard "degrees" don't apply the same way. Instead, the degree sequence of a tree dictates the shape of the right-leaning spines in the matching binary tree. More specifically,

* If a node $$X$$ in a tree has a degree of $$k$$, it means $$X$$ has $$k$$ children.
* In a binary tree, this translates to: $$X$$'s left child exists, and from that left child, there is a continuous chain of right children of length $$k - 1$$. For example, node 1 of a tree in our diagram has 3 children, thus node 2 of a binary tree has a right-leaning spine of length 2.
* A degree of 0 in a tree means the corresponding node in a binary tree has no left child.

## Exercise 6.9 🌟

{% hint style="success" %}
The book doesn't mention this, but this exercise introduces a generalized gambler’s ruin path known as [Łukasiewicz path](https://www.emergentmind.com/topics/lukasiewicz-paths-and-plane-trees).
{% endhint %}

The forest with $$N-1$$ nodes needs to be turned into the corresponding general tree by adding a dummy root node, as explained in the book. From here, follow the cited article about the 1:1 correspondence between a general tree of $$N$$ nodes and the new type of path with $$N$$ steps.

## Exercise 6.10 🌟

{% hint style="success" %}
This exercise introduces [André's "Reflection Principle"](https://webspace.ship.edu/msrenault/ballotproblem/monthly358-363-renault.pdf), a very important technique to reason about the basic ballot problem. The cited article also debunks the myth around this method.
{% endhint %}

Let’s rotate any lattice-path from Figure 6.6 by $$45^\degree$$ counterclockwise to get a graph of the winner’s margin as the ballots are counted. This perspective is more natural for understanding the dynamics behind the scene. We can represent ups and downs as follows:

* Reading a 0 is a step "Up-Right" $$(+1, +1)$$.
* Reading a 1 is a step "Down-Right" $$(+1, -1)$$.

Every path starts at $$A(0,0)$$ and lands at $$T(N,2k-N)$$, assuming $$k$$ zeros and $$N-k$$ ones in the path. For being a *good* path, it must never drop below the $$x$$ axis, i.e., touch the $$y=-1$$ line. Consequently, the necessary condition is $$2k-N \ge 0 \implies k \ge \lceil N/2 \rceil$$. In probability, it’s sometimes easier to count the complementary events. In our case, the number of *good* paths is a difference between total number of paths and number of *bad* paths. The former is simply $$\binom{N}{k}$$. For figuring out the number of *bad* paths, we apply the previously mentioned reflection method.

If a path is *bad*, it hits the line $$y = -1$$ at some point $$P(M,-1)$$. If we take the portion of the path *before* that first touch point and reflect it across the line $$y = -1$$, the path will now appear as if it had one more zeros then ones. Recall that a reflected path climbs up from $$A'(0,-2)$$ to $$P$$ and does the opposite of what $$A(0,0)$$ to $$P$$ does. The key insight with the reflection method is that we can establish a bijection between *bad* paths from $$A(0,0)$$ to $$T$$ via $$P$$ and paths from$$A'(0,-2)$$ to $$T$$ via $$P$$. Because for every possible way for a path to hit $$P$$ from $$A$$, we have a unique path to do the same from $$A'$$. Notice also that segments after $$P$$ overlap between these paths. Therefore, the number of *bad* paths from $$A$$ is equal to the number of all paths from $$A'$$. The latter is now easy to express as $$\binom{N}{k+1}$$.

To find the total number of valid strings of length $$N$$, we just add up these valid paths for all allowable values of $$k$$

$$
\sum\_{k = \lceil N/2 \rceil}^{N} \left\[ \binom{N}{k} - \binom{N}{k+1} \right]=\binom{N}{\lceil N/2 \rceil}=\binom{N}{\lfloor N/2 \rfloor}.
$$

The sum nicely telescopes and only the first term "survives" cancellations. Recall that $$\binom{N}{N+1}=0$$, hence the last term doesn’t matter.

## Exercise 6.11

There is a bijection between the parenthesis representation of an ordered forest to the plus-minus representation of its associated binary tree. We should swap symbols: $$( \space +$$ and $$) \space -$$. There is only one caveat, that there is an extra - at the end of the representation of a binary tree to encode the last external node. This difference should be taken into account when converting from one form into another.

## Exercise 6.12

This topic is known as [treemapping](https://en.wikipedia.org/wiki/Treemapping) and many algorithms exist that produce outputs of various quality. Aspect ratio plays a central role and a particularly popular variant is the [squarified treemap](https://vanwijk.win.tue.nl/stm.pdf). It attains $$\alpha$$ close to one.

As a side note, leveraging the ISO paper standard’s constant width-to-height aspect ratio of $$\alpha=\sqrt 2$$, we can develop a simple algorithm to represent binary trees with nested rectangles. This value of $$\alpha$$ ensures that any resulting smaller sub-rectangles will retain the exact same aspect ratio. Now, the algorithm would traverse in preorder the binary tree and for every internal node split the corresponding nested rectangle along its wider side into two; here, lines denote internal nodes. An extension to store ordered trees would be to first convert them into binary trees using rotation correspondence.

## Exercise 6.13

The paper [Rotational tree structures on binary trees and triangulations](https://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3697) explains in detail the rotational relationships among the $$N$$ trees corresponding to the $$N$$ rotations of an asymmetric triangulation. The article employs the identical hexagons as depicted in Figure 6.8 of the book.

## Exercise 6.14

{% hint style="danger" %}
The text of this exercise is flawed. It’s essential to restrict consideration to **positive integers**. Allowing negative numbers at the boundaries could result in an infinite number of valid sequences. Second, if we rely solely on those two explicit rules, assuming positive numbers, things already fall apart for $$N=2$$. Namely, the allowable sequences are (1,2), (2,1) and (1,1), but $$T\_2=2<3$$. If we should treat the constraint *"some larger integer must appear somewhere between any two occurrences of any integer"* to apply only for duplicates and force between them some larger number, then (1,1) clearly disappears. Nonetheless, things then go wrong for $$N=3$$, since all 6 permutations of 1,2,3 would work including (1,2,1). Again, $$T\_3=5<7$$.

I guess, the authors wanted to introduce in this exercise the [Cartesian tree](https://en.wikipedia.org/wiki/Cartesian_tree), tweaked to focus on maximums instead of minimums. Reconstruction of a sequence from a binary tree would happen by in-order traversal of the tree using a node's height as its value.
{% endhint %}

## Exercise 6.15 🌟

{% hint style="success" %}
This exercise augments the book by giving recursive formulations describing path length and height in general trees.
{% endhint %}

Let $$t\_i$$ denote the $$i$$th child of $$t$$, if any. Recall that a general tree cannot be empty. The recursive definitions are as follows:\
\
$$pl(t)=|t|-1+\sum\_ipl(t\_i) \ h(t)=\[|t|>1]+\max\_i,,h(t\_i).$$

Notice the usage of the [Iverson bracket](https://en.wikipedia.org/wiki/Iverson_bracket) in the definition of height.

## Exercise 6.16 🌟

{% hint style="success" %}
This exercise augments the book by giving recursive formulations for the number of leaves in binary trees and in general trees.
{% endhint %}

Let $$l(t)$$ denote the number of leaves in t. For a binary tree, if $$t$$ is an external node, then $$l(t)=0$$. The definitions are:

* Binary tree: $$l(t)=\[|t|=1]+l(t\_l)+l(t\_r)$$
* General tree: $$l(t)=\[|t|=1]+\sum\_i l(t\_i)$$.

See the previous exercise for notational conventions.

## Exercise 6.17 🌟

{% hint style="success" %}
This exercise establishes an important feature that the height of a binary tree with $$N$$ external nodes has to be at least $$\lg N$$.
{% endhint %}

We prove this by induction on $$N$$ (number of external nodes) using a bit stronger hypothesis $$h(t) \ge \lceil\lg N\rceil\ge \lg N$$, because the height is a positive integer. The base case ($$N=1$$) is trivially satisfied, since $$h(t)=0 \ge \lg 1=0$$.&#x20;

For the inductive step $$N>1$$, we have

$$
\begin{align\*}
h(t) &= 1+\max(h(t\_l), h(t\_r)) \\
&\ge 1+\lceil\lg \lceil N/2\rceil \rceil && \text{(by the inductive hypothesis)} \\
&\ge 1+\lceil\lg {N/2} \rceil \\
&=1+\lceil\lg N\rceil-1 \\
&=\lceil\lg N\rceil.
\end{align\*}
$$

The derivation relies on the fact that $$N=N\_l+N\_r$$, since the root is an internal node. Furthermore, one of the subtrees must have at least half of the nodes, while the other one must have at least one external node. Therefore, we can apply the inductive hypothesis on a larger subtree.

## Exercise 6.18

{% hint style="warning" %}
The correct Kraft equality is $$\sum\_j k\_j 2^{-j} = 1$$.
{% endhint %}

The [Kraft–McMillan inequality](https://en.wikipedia.org/wiki/Kraft%E2%80%93McMillan_inequality) is a generalization of this exercise. The referenced Wikipedia article provides a comprehensive proof, referring to leaves rather than external nodes. The stated equality applies to binary trees—defined, as in the book, as full binary trees—within the context of complete codes. See also [Exercise 6.37](#exercise-6.37) for more in depth details.

## Exercise 6.19 🌟

{% hint style="success" %}
This exercise gives tight upper and lower bounds on the path length of a general tree with $$N$$ nodes.&#x20;
{% endhint %}

The bounds are $$N-1\le pl(t) \le N(N-1)/2$$. The lower bound is achieved by a star tree (root with $$N-1$$ children), whilst the upper bound applies for a chain of nodes.

## Exercise 6.20 🌟

{% hint style="success" %}
This exercise gives tight upper and lower bounds on the internal and external path lengths of a binary tree with $$N$$ internal nodes.
{% endhint %}

Assume $$N>0$$. By the Lemma from the book, we have $$xpl(t) = ipl(t) + 2N$$. Therefore, bounds on the internal path length determines those on the external. From the previous exercise, we already know that

$$
ipl(t) \le N(N-1)/2 \implies xpl(t)\le N(N+3)/2.
$$

The lower bound is attained by the most balanced tree, where height is minimized. Consequently, $$h(t) \ge \lceil \lg (N+1) \rceil$$ (see [Exercise 6.17](#exercise-6.17)) with $$1 \le r \le 2^{h(t)-1}$$ leaves at the last level. This gives

$$
\begin{align\*}
ipl(t) &= \sum\_{k=1}^{h(t)-2}k2^k+(h(t)-1)r \\
&= (h(t)-3)2^{h(t)-1}+2+(h(t)-1)r \\
&= (h(t)-3)2^{h(t)-1}+2+(h(t)-1)(N-2^{h(t)-1}+1) \\
&= (h(t)-1)N - 2^{h(t)} + h(t) + 1 \\
&\ge N\lg N+O(N).
\end{align\*}
$$

Therefore, $$xpl(t)\ge N \lg N + O(N).$$

## Exercise 6.21 🌟

{% hint style="success" %}
This exercise gives tight upper and lower bounds on the number of leaves in a binary tree with $$N$$ nodes.
{% endhint %}

Assume that the number of nodes is $$N>0$$. The number of leaves satisfies

$$
1 \le l(t) \le \left\lceil \frac{N}{2} \right\rceil.
$$

The lower bound happens for the most unbalanced situation (chain of nodes). The upper bound is attained by the most balanced tree. In a perfect binary tree (where all leaves are at the last level), their number is exactly $$(N+1)/2=\lceil N/2 \rceil$$. Any other arrangement results in a lower number, since we need to remove two leaves at the last level, to get an additional one on the level above.

The above argument can be easily extended to non-perfect balanced binary trees. Namely, prune away all leaves at the last level and start adding them back. It’s easy to show by induction that the number of leaves will revolve around the given upper bound, since we need to extra nodes to effectively increase the number of leaves by one.

## Exercise 6.22

We can accomplish this with 2 registers as follows:

```
r1 ← z + y
r1 ← r1 * x
r2 ← v + y
r2 ← a - r2
r2 ← r2 / r1
r2 ← w + r2
r1 ← x + y
r1 ← r1 * z
r1 ← r1 - r2
```

Observe that this is an optimal number, since we cannot do it with only a single register. It perfectly matches the register allocation formula $$r(t)$$ explained at the end of Section 6.10 in the book.

{% hint style="info" %}
You might want to read about the [Sethi–Ullman algorithm](https://en.wikipedia.org/wiki/Sethi%E2%80%93Ullman_algorithm).
{% endhint %}

## Exercise 6.23

<figure><img src="/files/UCIezsQAa2HXuJaIqh5y" alt=""><figcaption></figcaption></figure>

The traversals are enrolled below from left to right:

* preorder: `*+abd`\
  inorder: `(a+b)*d`\
  postorder: `ab+d*`
* preorder: `-**+ab-de+fg*hi`\
  inorder: `(((a+b)*(d-e))*(f+g))-(h*i)`\
  postorder: `ab+de-*fg+*hi*-`

## Exercise 6.24

<figure><img src="/files/5vl9BnRMuYSHBDmQ9kQt" alt=""><figcaption><p>The tree denoting the expression <span class="math">((a^2 + b + c) ⋆ (d^4 – e^2) ⋆ (f + g + h)) – i ⋆ j</span>.</p></figcaption></figure>

The unadorned traversals of the tree are:

* preorder: `-*+*aabc-*dddd*ee+fgh*ij`
* postorder: `aa*bc+dddd*ee*-fgh+*ij*-`

These only work unambiguously if every operator has a fixed and known [arity](https://en.wikipedia.org/wiki/Arity). For example, `*aabc` could be interpreted as $$a^2$$, $$a^2*b$$. or $$a^2*b\*c$$. Arity indicators may resolve these ambiguities:

* preorder: `-2*3+3*2aabc-2*4dddd*2ee+3fgh*2ij`
* postorder: `aa*2bc+3dddd*4ee*2-2fgh+3*3ij*2-2`

<figure><img src="/files/2Jp8kGxAsPKAJm0Jx3DJ" alt=""><figcaption><p>Binary tree representation of the tree by rotation correspondence. A blue (left) edge designates the first child, whilst the red (right) edge represents the next sibling in the original tree. External nodes are omitted.</p></figcaption></figure>

The unadorned traversals of the binary tree are:

* preorder (same as the preorder for the tree): `-*+*aabc-*dddd*ee+fgh*ij`
* inorder (same as the postorder for the tree): `aa*bc+dddd*ee*-fgh+*ij*-`
* postorder: `aacb*ddddee**hgf+-+ji**-`

See [Exercise 6.8](#exercise-6.8) for details about how degrees (arity) can be deciphered based on the length of right leaning chains. Observe that the postorder traversal produces an algebraic mess.

## Exercise 6.25

The sequence [A138156](https://oeis.org/A138156) enumerates the sum of internal path lengths of all binary trees with $$N$$ edges ($$N-1$$ internal nodes). So, $$Q\_5=352/42 \approx 8.381$$.

The sequence [A288964](https://oeis.org/A288964) enumerates the number of key comparisons to sort all $$N!$$ permutations of $$N$$ elements by quicksort. So, $$C\_5=888/120=7.4$$.

#### Explanation for the Correspondence with Quicksort

The total number of key comparisons made by quicksort (using the first element as pivot) when applied to all $$N!$$ permutations of $$N$$ distinct elements equals the sum of the internal path lengths of the binary search trees obtained by inserting the elements in the same order. This implicit connection is mentioned in the book (see section 6.8 and [Exercise 6.34](#exercise-6.34)) as well as in the [book's course material](https://aofa.cs.princeton.edu/online/slides/AA06-Trees.pdf) (see slide 43).

Consider a fixed permutation. In quicksort, each recursive call picks a pivot, let it be the first element of the current subarray just to make the connection with BST clearer, and compares it with every other element in that subarray. Thus, if a pivot corresponds to a node $$v$$ whose subtree in the recursion tree (or in the corresponding BST) contains $$|t\_v|$$ elements, then the number of comparisons made at that node is $$|t\_v|-1$$. Summing over all nodes gives the total comparisons for that permutation

$$
c = \sum\_v (|t\_v|-1) = \left(\sum\_v |t\_v|\right) - n.
$$

Now, in the BST built by inserting the elements in the same order, the root is the first element, and the left and right subtrees are built from the elements that are smaller and larger, respectively, in the order they appear. This exactly mirrors the recursion of quicksort. For any tree, the sum of subtree sizes (including each node itself) is related to the internal path length by

$$
\sum\_v |t\_v| = ipl(t) + n,
$$

because each node contributes 1 to its own subtree and to the subtree of every ancestor. Combining the two equations yields $$c = (ipl(t) + n) - n = ipl(t)$$. Thus, the number of comparisons for a given permutation equals the internal path length of its associated BST. Summing over all permutations gives the desired equality.

## Exercise 6.26

A degenerate tree structure is a chain. Observe that a preorder traversal of such a tree reproduces the original sequence of insertions, because each insertion simply adds the new element at the end of the chain.

Let $$a\_i \neq a\_j$$ be two elements whose relative order differs in two different permutations that produce a chain. There must be such a pair. WLOG let $$i\<j$$, otherwise swap indices. The two BSTs cannot be structurally the same, since the directionality of branching at $$a\_i$$ must change. Recall that $$i\<j$$ entails that $$a\_i$$ was inserted before $$a\_j$$.

The height of a degenerate tree is $$h=N$$. The total number of degenerate trees is $$2^{h-1}=2^{N-1}$$, since we have two branching choices per edge and the number of edges is $$N-1$$. Therefore, the probability that a degenerate tree structure will result is $$2^{N-1}/N!$$.

## Exercise 6.27

The sequence [A076615](https://oeis.org/A076615) enumerates the number of permutations of $${1,2,\dots,N}$$ that result in a binary search tree with the minimum possible height. So, we just need to read out the corresponding entry and divide it by $$N!$$.

To derive the number $$T(n)$$ of insertion sequences that yield a perfectly balanced binary search tree of height $$n$$ (with $$N=2^n-1$$ nodes), we use a recursive combinatorial argument based on the tree structure.

Consider a perfectly balanced tree of the given height:

* The root is the median element (unique because keys are distinct).
* The left subtree is itself a perfectly balanced tree of height $$n-1$$ with $$2^{n-1} - 1$$ nodes.
* The right subtree is identical in size and shape.

For any insertion sequence to produce this tree, the root must come first. The remaining $$N-1 = 2^n - 2$$ nodes consist of all nodes from the left and right subtrees. The order in which they’re inserted must respect the internal constraints of each subtree: the nodes of the left subtree must appear in an order that yields the left subtree shape, and similarly for the right. However, the two subsequences can be interleaved arbitrarily.

Let $$M=2^{n-1} - 1$$ be the size of each subtree, so:

* The number of valid sequences for the left subtree is $$T(n-1)$$.
* The number for the right subtree is also $$T(n-1)$$.
* Once we have fixed the internal orders for each subtree, we need to interleave them into a single sequence of length $$2M$$ (the positions after the root). The number of ways to interleave two sequences of lengths $$M$$ and $$M$$ while preserving each internal order is $$\binom{2M}{M}$$. This counts the choices of which positions in the combined sequence are occupied by left subtree nodes; the left-subtree sequence then fills those positions in order, and the right subtree sequence fills the rest.

This gives

$$
T(n) = \binom{2^n - 2}{2^{n-1} - 1} \cdot T^2(n-1)=\prod\_{k=1}^{n-1} \left( \frac{(2^{,n-k+1} - 2)!}{\bigl((2^{,n-k} - 1)!\bigr)^2} \right)^{!2^{k-1}},
$$

with $$T(1)=1$$. The RHS follows by iterating the LHS. The probability of getting a perfectly balanced tree is $$T(n)/N!$$.

{% hint style="info" %}
As a quick check, we get that $$T(3)=80$$, which matches the previously mentioned sequence for $$N=7$$ (it’s zero indexed).
{% endhint %}

## Exercise 6.28

Preorder traversal naturally preserves the tree structure, since roots of subtrees are visited first. This ensures that the relative order between ancestors and their descendants are properly maintained, since parents are already in place before their children are inserted. By induction, the tree is reconstructed correctly.

The postorder traversal of the tree below is `1 2`. But this sequence would produce a right leaning tree, which is different than the original. Thus, postorder traversal doesn’t preserve the tree structure.

<figure><img src="/files/OAAw8O1F7ooWSrCVDISw" alt="" width="107"><figcaption></figcaption></figure>

The level order traversal also preserves the structure, since nodes are visited in a manner that retains the relative order between ancestors (nodes visited/inserted earlier) and their descendants. Again, parents are already in place before their children are inserted. By induction, the tree is reconstructed correctly.

## Exercise 6.29

$$
\begin{align\*}
C\_T(z) &= \sum\_{t\_l \in \mathcal{T}}\sum\_{t\_r \in \mathcal{T}}(\text{ipl}(t\_l)+\text{ipl}(t\_r)+|t\_l|+|t\_r|)z^{|t\_l|+|t\_r|+1} \\
&= z \left\[\sum\_{t\_l \in \mathcal{T}}\sum\_{t\_r \in \mathcal{T}}(\text{ipl}(t\_l)+\text{ipl}(t\_r)+|t\_l|+|t\_r|)z^{|t\_l|+|t\_r|} \right] \\
&= z \left\[\sum\_{t\_l \in \mathcal{T}}z^{|t\_l|} \sum\_{t\_r \in \mathcal{T}}(\text{ipl}(t\_l)+\text{ipl}(t\_r)+|t\_l|+|t\_r|)z^{|t\_r|} \right] \\
&= z \left\[\sum\_{t\_l \in \mathcal{T}}z^{|t\_l|} \sum\_{t\_r \in \mathcal{T}}(\text{ipl}(t\_l)z^{|t\_r|}+\text{ipl}(t\_r)z^{|t\_r|}+|t\_l|z^{|t\_r|}+|t\_r|z^{|t\_r|}) \right] \\
&= z \left\[\sum\_{t\_l \in \mathcal{T}}z^{|t\_l|} \left(\text{ipl}(t\_l)T(z)+C\_T(z)+|t\_l|T(z)+zT'(z) \right)\right] \\
&= z \left\[C\_T(z)T(z)+C\_T(z)T(z)+zT(z)T'(z)+zT(z)T'(z)\right] \\
&= 2zC\_T(z)T(z)+2z^2T(z)T'(z).
\end{align\*}
$$

## Exercise 6.30

$$
\begin{align\*}
C\_G(z) &= \sum\_{t \in \mathcal{G}}\text{pl}(t)z^{|t|} \\
&= \sum\_{k \ge 0}\sum\_{t\_1 \in \mathcal{G}}\sum\_{t\_2 \in \mathcal{G}} \dots \sum\_{t\_k \in \mathcal{G}} (\text{pl}(t\_1) +\text{pl}(t\_2)+ \dots + \text{pl}(t\_k)+|t\_1|+|t\_2|+ \dots +|t\_k|)z^{|t\_1|+|t\_2|+ \dots +|t\_k|+1} \\
&= \sum\_{k \ge 0} z \left\[\sum\_{t\_1 \in \mathcal{G}} z^{|t\_1|}\sum\_{t\_2 \in \mathcal{G}}z^{|t\_2|} \dots \sum\_{t\_k \in \mathcal{G}} (\text{pl}(t\_1) +\text{pl}(t\_2)+ \dots + \text{pl}(t\_k)+|t\_1|+|t\_2|+ \dots +|t\_k|)z^{|t\_k|} \right]\\
&= \sum\_{k \ge 0} z \left\[\sum\_{t\_1 \in \mathcal{G}} z^{|t\_1|}\sum\_{t\_2 \in \mathcal{G}}z^{|t\_2|} \dots \bigg(\text{pl}(t\_1)G(z)+\text{pl}(t\_2)G(z)+ \dots + C\_G(z)+|t\_1|G(z)+|t\_2|G(z)+ \dots +zG'(z)\bigg) \right] \\
&\dots \\
&= \sum\_{k \ge 0} z \left\[kC\_G(z)G(z)^{k-1}+kzG'(z)G(z)^{k-1}) \right] \\
&= \frac{zC\_G(z)+z^2G'(z)}{(1-G(z))^2} \qquad \text{$\left(\sum\_k kx^{k-1}=1/(1-x)^2\right)$}.
\end{align\*}
$$

## Exercise 6.31

We can derive the relationship by monitoring how node levels translate between the two tree structures and using the inherent symmetry of binary trees.&#x20;

A general tree $$G$$ of $$N$$ nodes maps to a binary tree $$T$$ of $$N$$ nodes where the root only has a left subtree of size $$N-1$$. A key insight is that the level of a node $$v$$ in $$G$$, denoted as $$\text{level}\_G(v)$$, matches the number of left branches on the path from the root to $$v$$ in $$T$$, denoted as $$\text{left-branches}\_T(v)$$. This follows from the *first-child, next-sibling* rule of rotation correspondence. Thus, we can express the path length of $$G$$ in terms of left branches in $$T$$

$$
\text{pl}(G)=\sum\_{v \in G} \text{level}*G(v) = \sum*{v \in T} \text{left-branches}\_T(v).
$$

The number of left branches of any node in $$T$$ is one more than this number taken in the root's left subtree $$t\_l$$. This gives

$$
\text{pl}(G)= \sum\_{v \in t\_l} (1+\text{left-branches}*{t\_l}(v))=N-1+ \sum*{v \in t\_l} \text{left-branches}\_{t\_l}(v).
$$

Now we look at the average over all randomly generated trees. Let $$\mathbb{E}\[\text{pl}(G\_N)]$$ be the expected path length of a random general tree of size $$N$$, and $$\mathbb{E}\[\text{ipl}(T\_N)]$$ be the expected internal path length of a random binary tree of size $$N$$. On average, we expect that the total number of left branches across all paths in $$T$$ equals the total number of right branches. Since the internal path length of a binary tree is the sum of left and right branches for all nodes, we’ve

$$
\mathbb{E}\left\[\sum\_{v \in t\_l} \text{left-branches}*{t\_l}(v)\right] = \frac{1}{2} \mathbb{E}\[\text{ipl}(T*{N-1})].
$$

This gives

$$
\mathbb{E}\[\text{pl}(G\_N)] =N-1+\frac{1}{2} \mathbb{E}\[\text{ipl}(T\_{N-1})].
$$

This result matches the formula from the book, once we substitute the expression from Theorem 6.3 into the RHS.

## Exercise 6.32

In Program 1.2, the pivot selected for each subarray is its final element. Accordingly, it is necessary to ensure that the last element of each subarray becomes the root of the subsequent subtree. It should be noted that during each iteration, the pivot is exchanged with the border element, which must be considered in the overall process. Possible permutations corresponding to Figure 6.12, listed from left to right, are as follows:

1. `AA,AB,CN,MS,EF,MC,JB,PD,AL`
2. `PD,MS,MC,JB,EF,CN,AL,AB,AA`
3. `AA,AL,AB,EF,CN,MS,MC,PD,JB`

## Exercise 6.33 🌟

{% hint style="success" %}
This exercise showcases some important technical details pertaining to search probabilities.
{% endhint %}

The probability that the successful search cost is 2 equals the probability that the key matches the root of the left or right subtree of the tree's root. Due to symmetry, these two probabilities are the same. Observe that subtrees of random BSTs are themselves random BSTs, so the randomness property is fully preserved (like, in Quicksort). Let $$S\_k$$ be a subtree of size $$k$$. The joint probability can be expressed as

$$
\begin{align\*}
\Pr{\text{key = root of $S\_k$}} &= \Pr{\text{($|S|=k$) $\cap$ (key$\in S$) $\cap$ (key = root of $S$)}} \\
&= \frac{1}{N} \cdot\frac{k}{N} \cdot \frac{1}{k} \\
&= \frac{1}{N^2}.
\end{align\*}
$$

We ignore $$k=0$$, since only successful searches are considered. The final probability is computed by summing over $$0\<k\<N$$ and multiplying by two

$$
\[u^2]s\_N(u)= \frac{2(N-1)}{N^2}.
$$

## Exercise 6.34

The following Python script creates the requested plot.

{% code expandable="true" %}

```python
import random
import matplotlib.pyplot as plt

class Node:
    __slots__ = ['key', 'left', 'right']

    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

def insert(root, key):
    if root is None:
        return Node(key)

    curr = root
    while True:
        if key < curr.key:
            if curr.left is None:
                curr.left = Node(key)
                break
            curr = curr.left
        elif key > curr.key:
            if curr.right is None:
                curr.right = Node(key)
                break
            curr = curr.right
    return root

def search_cost(root, key):
    cost = 0
    curr = root
    while curr is not None:
        cost += 1
        if key == curr.key:
            return cost
        elif key < curr.key:
            curr = curr.left
        else:
            curr = curr.right
    return cost

def run_experiment():
    keys = list(range(1, 1001))
    random.shuffle(keys)

    root = None
    for key in keys:
        root = insert(root, key)

    costs = []
    for _ in range(10_000):
        # Pick a random key that is known to be in the tree
        target = random.choice(keys)
        cost = search_cost(root, target)
        costs.append(cost)

    max_cost = max(costs)
    avg_cost = sum(costs) / len(costs)
    bins = range(1, max_cost + 2)

    plt.figure(figsize=(10, 6))
    plt.hist(costs, bins=bins, align='left', rwidth=0.8, color='#4A90E2', edgecolor='black')
    plt.axvline(avg_cost, color='red', linestyle='dashed', linewidth=2, label=f'Average Cost: {avg_cost:.2f}')
    plt.title("Search Costs in a Random 1000-Node BST (10,000 Searches)")
    plt.xlabel("Search Cost (Number of Nodes Visited)")
    plt.ylabel("Frequency")
    plt.xticks(range(1, max_cost + 1))
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

if __name__ == "__main__":
    run_experiment()
```

{% endcode %}

<figure><img src="/files/LvthJprLKHpCkZL2mWa3" alt=""><figcaption><p>Figure 1.4 actually shows the histogram of construction costs, which are tightly related to search costs (see the proof of Theorem 6.6). The alignment is very good, as the average is between 11 and 12 (it’s around 12,000 in Figure 1.4).</p></figcaption></figure>

## Exercise 6.35

The following Python script creates the requested plot.

{% code expandable="true" %}

```python
import random
import matplotlib.pyplot as plt

class Node:
    __slots__ = ['key', 'left', 'right']

    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

def insert(root, key):
    if root is None:
        return Node(key)

    curr = root
    while True:
        if key < curr.key:
            if curr.left is None:
                curr.left = Node(key)
                break
            curr = curr.left
        elif key > curr.key:
            if curr.right is None:
                curr.right = Node(key)
                break
            curr = curr.right
    return root

def search_cost(root, key):
    cost = 0
    curr = root
    while curr is not None:
        cost += 1
        if key == curr.key:
            return cost
        elif key < curr.key:
            curr = curr.left
        else:
            curr = curr.right
    return cost

def run_experiment():
    costs = []
    print("Running 10,000 trials. This might take a few seconds...")

    for i in range(10000):
        if (i + 1) % 2000 == 0:
            print(f"Completed {i + 1} trials...")

        keys = list(range(1, 1001))
        random.shuffle(keys)

        root = None
        for key in keys:
            root = insert(root, key)

        target = random.choice(keys)
        cost = search_cost(root, target)
        costs.append(cost)

    max_cost = max(costs)
    bins = range(1, max_cost + 2)

    plt.figure(figsize=(10, 6))
    plt.hist(costs, bins=bins, color='#4A90E2', edgecolor='black', rwidth=0.8)
    plt.xticks(range(1, max_cost + 1))
    plt.title("Search Costs Across 10,000 Randomly Generated 1000-Node BSTs")
    plt.xlabel("Search Cost (Number of Nodes Visited)")
    plt.ylabel("Frequency")
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

if __name__ == "__main__":
    run_experiment()
```

{% endcode %}

<figure><img src="/files/nU1tuQ16XkrrY9LHRWvz" alt=""><figcaption><p>The distribution looks slightly skewed to the right, also visible in Figure 1.4; while most nodes are around the average level, few branches in an unbalanced BST can get unusually long.</p></figcaption></figure>

## Exercise 6.36

By iterating the recurrence for $$p\_N(u)$$, we can get the following closed-form solution:

$$
p\_N(u) = \prod\_{k=1}^N \frac{k - 1 + 2u}{k + 1} \quad \text{ with $p\_0(u)=1$}.
$$

Let’s take the natural logarithm of both sides to make differentiation much easier

$$
\ln p\_N(u) = \sum\_{k=1}^N \ln(k - 1 + 2u) - \sum\_{k=1}^N \ln(k + 1).
$$

The first derivative is

$$
\frac{d}{du} \ln p\_N(u) = \frac{p\_N'(u)}{p\_N(u)} = \sum\_{k=1}^N \frac{2}{k - 1 + 2u}.
$$

The average is $$p\_N'(1) = 2(H\_{N+1} - 1)$$. The second derivate is

$$
\frac{d^2}{du^2} \ln p\_N(u) = \frac{p\_N''(u)p\_N(u) - p\_N'(u)^2}{p\_N(u)^2}= \frac{d}{du} \sum\_{k=1}^N \frac{2}{k - 1 + 2u} = \sum\_{k=1}^N \frac{-4}{(k - 1 + 2u)^2}.
$$

Evaluating it at $$u=1$$ gives

$$
p\_N''(1) - p\_N'(1)^2= \sum\_{k=1}^N \frac{-4}{(k + 1)^2}= -4(H\_{N+1}^{(2)} - 1).
$$

Recall that $$p\_N(1)=1$$, since $$p\_N(u)$$ represents a probability distribution. Substituting all components back into the formula for variance, we get the equation from the book.

## Exercise 6.37

We reuse many ideas from [Exercise 6.33](#exercise-6.33). Searching for a randomly chosen key in a tree of size $$N$$ encompasses the following cases:

1. We hit the root with probability $$\frac{1}{N}$$.
2. Because the left subtree of size $$k$$ is uniformly distributed, and the probability that the target key happens to be in that specific left subtree is $$\frac{k}{N}$$, the search costs $$1$$ (for the root) plus the successful search cost within a tree of size $$k$$.
3. A successful search in a singleton tree entails $$s\_1(u)=u$$.

This gives

$$
\begin{align\*}
s\_N(u) &= \frac{1}{N} u + \sum\_{k=0}^{N-1} \frac{1}{N} \cdot \frac{k}{N} \cdot u \cdot s\_k(u) + \sum\_{k=0}^{N-1} \frac{1}{N} \cdot \frac{N-1-k}{N} \cdot u \cdot s\_{N-1-k}(u) \\
&= \frac{u}{N} + \frac{2u}{N^2} \sum\_{k=0}^{N-1} k \cdot s\_k(u) && \text{(due to symmetry)} \\
&= \frac{u}{N^2} + \frac{(N-1)(N-1+2u)}{N^2} s\_{N-1}(u) && \text{(after eliminating summation)}.
\end{align\*}
$$

Differentiating both sides using the product rule gives

$$
N^2 s\_N'(u) = 1 + (N-1) \left\[ 2 \cdot s\_{N-1}(u) + (N-1+2u) \cdot s\_{N-1}'(u) \right].
$$

Evaluated at $$u=1$$, we get the average (see [Exercise 2.14](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/Sy1KF5Jq9j0bEpfHwXvN#exercise-2.14))

$$
\begin{align\*}
s\_N'(1) &= \frac{2N - 1}{N^2} + \frac{N^2 - 1}{N^2} s\_{N-1}'(1) \\
&= \boxed{2H\_N – 3 + 2H\_N/N}.
\end{align\*}
$$

The second derivative is

$$
\begin{align\*}
N^2 s\_N''(u) &= (N-1) \left\[ 2s\_{N-1}'(u) + 2s\_{N-1}'(u) + (N-1+2u) s\_{N-1}''(u) \right] \\
&= (N-1) \left\[ 4s\_{N-1}'(u) + (N-1+2u) s\_{N-1}''(u) \right].
\end{align\*}
$$

Evaluated at $$u=1$$, we get (see [Exercise 2.14](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/Sy1KF5Jq9j0bEpfHwXvN#exercise-2.14) and [Exercise 3.8](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/haY46WTvhNwGF6h7VSOn#exercise-3.8))

$$
\begin{align\*}
N^2 s\_N''(1) &= 4(N-1) s\_{N-1}'(1) + (N^2 - 1) s\_{N-1}''(1) \\
&\therefore s\_N''(1) = \left(1 + \frac{1}{N}\right) \left(4H\_N^2 - 4H\_N^{(2)} - 12H\_N + 16\right) + \frac{8H\_N - 16}{N}.
\end{align\*}
$$

After substituting all components into the formula for the variance, we get

$$
\begin{align\*}
\sigma^2 &= 2H\_N - 4H\_N^{(2)} + 4 + \frac{10H\_N - 4H\_N^2 - 4H\_N^{(2)}}{N} - \frac{4H\_N^2}{N^2} \\
&= \boxed{\frac{N+1}{N} \left(2H\_{N+1} - 4H\_{N+1}^{(2)} + 2\right) - \frac{N+1}{N^2} \left(2H\_{N+1} - 2\right)^2 + 2}.
\end{align\*}
$$

## Exercise 6.38 🌟

{% hint style="success" %}
This exercise builds on the Kraft equality for binary trees to establish the connection between PGFs for successful and unsuccessful searches in a BST. Our direct method uses the structural relationship between internal nodes (successful searches) and external nodes (unsuccessful searches) that holds true for any binary tree.
{% endhint %}

Let $$s\_N(u)$$ be the PGF for successful search (see the previous exercise) and $$p\_N(u)$$ for unsuccessful search (see the book). We define these PGFs as follows:

* A successful search costs $$l(v)+1$$ (where $$l(v)$$ is the internal node's level), so $$s\_N(u) = \frac{1}{N} \sum\_{\text{v $\in$ Internal}} u^{l(v)+1}$$.
* An unsuccessful search costs $$l(v)$$ (where $$l(v)$$ is the external node's level), so $$p\_N(u) = \frac{1}{N+1} \sum\_{\text{v $\in$ External}} u^{l(v)}$$.

#### Establishing the Relationship between PGFs

Let $$P\_N(u)=(N+1)p\_N(u)$$ and $$S\_N(u)=Ns\_N(u)/u$$. This is the same idea as in the proof of Theorem 6.3, moving from a probabilistic to an enumerative regime. The division by $$u$$ in $$S\_N(u)$$ aligns exponents in the two GFs.

We aim to find a way to express these two quantities, by inspecting how any binary tree is built from scratch. Following the $$N$$th insertion, an external node at level $$m$$ is substituted with a new node, resulting in the emergence of two additional external nodes. Therefore,

$$
\Delta S\_N(u)=u^m \text{ and } \Delta P\_N(u)=2u^{m+1}-u^m=(2u-1)\Delta S\_N(u).
$$

We can conclude, and easily prove by induction, that the following identity holds

$$
P\_N(u)=1+(2u-1)S\_N(u) \\\[0.3cm]
\therefore \underbrace{u(N+1)p\_N(u)}*{uP\_N(u)} = u + (2u-1)\underbrace{Ns\_N(u)}*{uS\_N(u)}.
$$

{% hint style="info" %}
If we plug $$u=1/2$$ into $$\sum\_{\text{v $\in$ External}} u^{l(v)}=1+(2u-1)\sum\_{\text{v $\in$ Internal}} u^{l(v)}$$, we get 1, which is the Kraft equality that describes the profile of a binary tree (see [Exercise 6.18](#exercise-6.18)).
{% endhint %}

#### Computing Moments

Now, we follow the standard approach for calculating moments. The first derivative is

$$
(N+1)(p\_N(u) + up\_N'(u)) = 1 + N(2s\_N(u) + (2u-1)s\_N'(u)).
$$

Evaluating at $$u=1$$ gives the average cost of a successful search

$$
(N+1)(1 + p\_N'(1)) = 1 + N(2 + s\_N'(1)) \\\[0.3cm]
\therefore s\_N'(1) =  \frac{N+1}{N} p\_N'(1) - 1=2H\_N-3+2H\_N/N.
$$

Recall that $$p\_N(1)=s\_N(1)=1$$ and we already know $$p'\_N(1)$$ from Theorem 6.6. The second derivative is

$$
(N+1)(2p\_N'(u) + up\_N''(u)) = N(4s\_N'(u) + (2u-1)s\_N''(u)).
$$

Evaluating at $$u=1$$ gives

$$
(N+1)(2p\_N'(1) + p\_N''(1)) = N(4s\_N'(1) + s\_N''(1)) \\\[0.3cm]
\therefore s\_N''(1) = \frac{N+1}{N}(2p\_N'(1) + p\_N''(1)) - 4s\_N'(1).
$$

The variance of a successful search is $$\sigma^2\_S= s\_N''(1) + s\_N'(1) - s\_N'(1)^2$$. Thus,

$$
\begin{align\*}
\sigma^2\_S &= \frac{N+1}{N}(2p\_N'(1) + p\_N''(1)) - 3s\_N'(1) - s\_N'(1)^2. \\\[0.3cm]
&= \frac{N+1}{N}  \sigma^2\_U - \frac{N+1}{N^2} p\_N'(1)^2 + 2 \\\[0.3cm]
&= \frac{N+1}{N} \left(2H\_{N+1} - 4H\_{N+1}^{(2)} + 2\right) - \frac{N+1}{N^2} \left(2H\_{N+1} - 2\right)^2 + 2.
\end{align\*}
$$

## Exercise 6.39 🌟

{% hint style="success" %}
This exercise demonstrates Theorem 6.7 in "inverted" regime, when we’re seeking a coefficient associated with a local property (like, the average degree of the root of a general tree).
{% endhint %}

{% hint style="warning" %}
At the time of writing, the book's [website](https://aofa.cs.princeton.edu/30gf/) indicates that solving this exercise assumes familiarity with the Lagrange inversion theorem; this note is also relevant for several subsequent exercises. It appears that this topic was originally included in Chapter 3 before being reassigned to Section 6.12. However, the exercises themselves were not relocated accordingly.
{% endhint %}

Theorem 6.7 is built to express a cumulative cost function $$c(t)$$ for an additive parameter in terms of a local toll function $$e(t)$$:

* $$E\_G(z)$$ is the CGF for the local toll $$e(t)$$. In our case, it’s the degree of the root of a tree.
* $$C\_G(z)$$ is the CGF for the global cost $$c(t)$$. The global cost is the sum of the local tolls for every single node in the entire tree.

To find the average of just that local toll across all trees of size $$N$$, we must extract the coefficient from $$E\_G(z)$$ and divide it by the total number of trees:

$$
\text{Average Root Degree}=\frac{\[z^N]E\_G(z)}{G\_N}.
$$

This is different from what the Corollary says in the book. Our local toll function $$e(t)$$ is simply the degree of the root of $$t$$. The recurrence relation of $$c(t)$$ entails that we’re summing the root degree of the main tree, plus the root degrees of its subtrees, and so on. This turns out to be the total number of edges in a tree, hence $$c(t) = |t| - 1$$. Now, we can work backwards, by first finding the $$C\_G(z)$$. We’ve

$$
\begin{align\*}
C\_G(z) &= \sum\_{t \in \mathcal{G}} c(t) z^{|t|} \\
&= \sum\_{t \in \mathcal{G}} (|t|-1) z^{|t|} \\
&= zG'(z) - G(z) \\
&= \frac{z}{\sqrt{1-4z}} - \frac{1 - \sqrt{1-4z}}{2}.
\end{align\*}
$$

By using the identity from Theorem 6.7, we get

$$
\begin{align\*}
E\_G(z) &= \frac{2 C\_G(z)}{1 + \frac{1}{\sqrt{1-4z}}} \\
&= \frac{2 C\_G(z) \sqrt{1-4z}}{\sqrt{1-4z} + 1} \\
&= \frac{1 - 2z - \sqrt{1-4z}}{\sqrt{1-4z} + 1}.
\end{align\*}
$$

We can use the substitutions $$z = G(z)(1-G(z))$$ and $$\sqrt{1-4z} = 1 - 2G(z)$$ to define everything in terms of $$G(z)$$ (see also [Exercise 6.41](#exercise-6.41))

$$
E\_G(z) =\frac{G(z)^3}{z}.
$$

The total sum of root degrees for all trees of size $$N$$ is

$$
\[z^N] E\_G(z) = \[z^{N+1}] G(z)^3.
$$

Using the Lagrange inversion theorem for Catalan trees $$\[z^N] G(z)^k = \frac{k}{2N-k} \binom{2N-k}{N}$$, we’ve

$$
\[z^N] E\_G(z) = \frac{3}{2N-1} \binom{2N-1}{N+1}.
$$

After dividing by $$G\_N=T\_{N-1}$$, we get that the average number of children of the root in a random Catalan tree of $$N$$ nodes is $$3(N-1)/(N+1)$$. This perfectly matches the examples in Figure 6.3 from the book.

## Exercise 6.40

The toll CGF follows from [Exercise 6.4](#exercise-6.4)

$$
\[z^N]E\_G(z)=G\_{N-1} \implies E\_G(z)=zG(z).
$$

{% hint style="info" %}
The above derivation demonstrates the equivalence between the enumerative and structural definitions of generating functions. Namely, we can define the toll CGF sequence-wise or combinatorially, hence $$E\_G(z)=\sum\_N e\_Nz^N=\sum\_{t \in \mathcal{G}} e(t)z^{|t|}$$. Sometimes, one form is more convenient than the other. In this case, we already know what is $$e\_N$$.
{% endhint %}

Applying Theorem 6.7 gives

$$
\begin{align\*}
C\_G(z) &= \frac{1}{2} E\_G(z) \left( 1 + \frac{1}{\sqrt{1-4z}} \right) \\
&= \frac{1}{2} z G(z) \left( \frac{\sqrt{1-4z} + 1}{\sqrt{1-4z}} \right) \\
&= \frac{1}{2} z \left( \frac{1 - \sqrt{1-4z}}{2} \right) \left( \frac{1 + \sqrt{1-4z}}{\sqrt{1-4z}} \right) \\
&= \frac{z^2}{\sqrt{1-4z}}.
\end{align\*}
$$

Based on the corollary of Theorem 6.7, the average number of 1-child nodes per tree is

$$
\frac{\[z^N]C\_G(z)}{G\_N}=\frac{N(N-1)}{4N-6}.
$$

From Figure 6.3 in the book, the answer is $$10/7$$ for $$N=5$$. This perfectly matches our formula. To get the proportion of nodes, we need to divide this amount by $$N$$. Therefore, the answer is $$(N-1)/(4N-6)$$.

## Exercise 6.41

The toll CGF follows from [Exercise 6.5](#exercise-6.5), hence $$E\_G(z)=zG(z)^k.$$ Applying Theorem 6.7 gives

$$
\begin{align\*}
C\_G(z) &= \frac{1}{2} E\_G(z) \left( 1 + \frac{1}{\sqrt{1-4z}} \right) \\
&= \frac{1}{2} z G(z)^k \left( \frac{\sqrt{1-4z} + 1}{\sqrt{1-4z}} \right) \\
&= \frac{1}{2} z \left( \frac{1 - \sqrt{1-4z}}{2} \right)^k \left( \frac{1 + \sqrt{1-4z}}{\sqrt{1-4z}} \right).
\end{align\*}
$$

Extracting coefficients from this GF is quite a mess. Instead, we’ll apply the extended Lagrange inversion theorem (see also the remark in [Exercise 6.39](#exercise-6.39)). For Catalan trees, the standard functional equation is

$$
G(z) = z f(G(z)), \text{ where }f(u) = \frac{1}{1-u}.
$$

Consequently, $$z = G(z)(1 - G(z))$$ and Theorem 6.7 simplifies to

$$
C\_G(z) = \frac{E\_G(z)}{1 - zf'(G(z))}=\frac{G(z)^{k+1}(1-G(z))}{1 - zf'(G(z))}.
$$

Let’s express the numerator in terms of $$g(u)$$, so the coefficient extraction is

$$
\[z^N] \frac{g(G(z))}{1 - zf'(G(z))} = \[u^N] g(u) f(u)^N.
$$

{% hint style="info" %}
Notice that $$f'(G(z))=1/(1-G(z))^2$$, so after substituting the expression for $$z$$, we get the simplified denominator in terms of $$f(u)$$.
{% endhint %}

This gives

$$
\begin{align\*}
\[z^N] C\_G(z) &= \[u^N] \underbrace{u^{k+1}(1-u)}\_{g(u)} (1-u)^{-N} \\
&= \[u^N] u^{k+1} (1-u)^{-(N-1)} \\
&= \[u^{N-k-1}] (1-u)^{-(N-1)} = \binom{2N - k - 3}{N - k - 1}.
\end{align\*}
$$

The exact proportion of nodes with $$k$$ children is (see also the previous exercise)

$$
\frac{\dbinom{2N - k - 3}{N - k - 1}}{NG\_N}= \frac{\dbinom{2N - k - 3}{N - k - 1}}{\dbinom{2N-2}{N-1}}.
$$

## Exercise 6.42

We already know from the book, that the average number of leaves in a random binary Catalan tree of $$N$$ nodes is $$N(N+1)/(2(2N-1))$$. Hence, the fraction of leaves is $$(N+1)/(2(2N-1))$$.

To compute the fraction of internal nodes with a single child, we apply Theorem 6.7 with the toll CGF $$E\_T(z)=2z(T(z)-1)$$ (see also [Exercise 6.40](#exercise-6.40)). In this instance, it’s necessary to omit the empty subtree, while permitting either the left or the right branch to contain nodes. This gives

$$
\begin{align\*}
C\_T(z) &= \frac{E\_T(z)}{\sqrt{1-4z}} \\
&= \frac{2z(T(z)-1)}{\sqrt{1-4z}} \\
&= \frac{1 - 2z}{\sqrt{1-4z}} - 1.
\end{align\*}
$$

The fraction of nodes of this type is

$$
\frac{\[z^N]C\_T(z)}{NT\_N}=\frac{\binom{2N}{N} \frac{N-1}{2N-1}}{\frac{N}{N+1}\binom{2N}{N}} = \frac{(N-1)(N+1)}{N(2N-1)}.
$$

To compute the fraction of internal nodes with two children, we can apply Theorem 6.7 with the toll CGF $$E\_T(z)=z(T(z)-1)^2$$. But there is a simpler way. Namely, the sum of fractions must be 1, since we cover all cases. Therefore, the fraction of nodes with two children is

$$
1 - \frac{N+1}{2(2N-1)} - \frac{(N-1)(N+1)}{N(2N-1)}= \frac{(N-1)(N-2)}{2N(2N-1)}.
$$

## Exercise 6.43

We already know from the book, that the average number of leaves in a random BST of $$N$$ nodes is $$(N+1)/3$$. Hence, the fraction of leaves is $$(N+1)/(3N)$$.

To compute the fraction of internal nodes with a single child, we apply Theorem 6.7 with $$e\_N=2/N$$ for $$N \ge 2$$. This follows from the fact that the root of a subtree of size $$N$$ must be the smallest or the largest element in order to have a single child.&#x20;

{% hint style="info" %}
The coefficient $$e\_N$$ is an expected value of $$e(t)$$. An indicator random variable is implicitly introduced, taking the value 1 when a particular root possesses only one child. The expected value of this variable corresponds to its probability.
{% endhint %}

Thus,

$$
\begin{align\*}
E\_B(z) &= \sum\_{N \ge2} \frac{2}{N} z^N = 2 \ln\left(\frac{1}{1-z}\right) - 2z \\
E'\_B(z) &= \frac{2z}{1-z}.
\end{align\*}
$$

This gives the same fraction as for leaves

$$
\begin{align\*}
C\_B(z) &= \frac{1}{(1-z)^2} \int\_0^z (1-x)^2 \left(\frac{2x}{1-x}\right) dx \\
&= \frac{z^2}{(1-z)^2} - \frac{2z^3}{3(1-z)^2} \\
&\therefore \[z^N]C\_B(z) = \frac{N+1}{3}.
\end{align\*}
$$

The fraction of nodes with two children is

$$
1-2 \cdot \frac{N+1}{3N}=\frac{N-2}{3N}.
$$

## Exercise 6.44

For all subtasks, the mechanics of calculating the variance is based on Table 3.5. We omit the technical details and only present the derivation of BGFs with final results for variances.

***

The BGF for the number of leaves in a random binary Catalan tree is given in Section 5.4 of the book

$$
T(z,u)+z=1+zu+zT(z,u)^2.
$$

Therefore,

$$
\boxed{\sigma\_N^2 = \frac{N(N+1)(N-1)(N-2)}{2(2N-1)^2(2N-3)}}.
$$

***

A general tree is either a leaf or a root attached to a non-empty forest. This augmented definition isolates the leaf, similarly to how the book resolved the same problem in Section 5.4. Therefore,

$$
G=Z+Z \times (\mathcal{F}-E).
$$

We should mark the leaf, so this translates to

$$
G(z,u) = zu + \frac{zG(z,u)}{1 - G(z,u)}= zu + z(1-u)G(z,u) + G(z,u)^2.
$$

The variance is

$$
\boxed{\sigma\_N^2 =\frac{N(N-2)}{4(2N-3)}}.
$$

***

Setting up the BGF for a random BST proceeds along the lines of the computation of moments presented in Section 3.10 of the book (see the Quicksort distribution subsection as well as [Exercise 3.67](/sh-aofa/discrete-mathematical-methods/chapter-3-generating-functions.md#derivation-of-the-functional-equation)). Let $$Q\_N(u)$$ be the PGF for the number of leaves in a random BST of size $$N$$. The base cases are:

* An empty tree has no leaves, so $$Q\_0(u)=1$$.
* A singleton tree has one leaf, so $$Q\_1(u)=u$$.

Subtree sizes are uniformly distributed with probability $$1/N$$, as emphasized in the book, hence

$$
Q\_N(u) = \frac{1}{N} \sum\_{j=0}^{N-1} Q\_j(u) Q\_{N-1-j}(u) \quad \text{ for $N>1$}.
$$

As in the proof of Theorem 6.3, to simplify this recurrence, we move to an "enumerative" approach

$$
N Q\_N(u) = \sum\_{j=0}^{N-1} Q\_j(u) Q\_{N-1-j}(u).
$$

Let our exponential BGF be $$Q(z,u) = \sum\_{N\ge0} Q\_N(u) z^N.$$ If we multiply our recurrence by $$z^{N-1}$$ and sum on $$N \ge 1$$, the LHS transforms into

$$
\sum\_{N\ge1} N Q\_N(u) z^{N-1} = Q\_z(z,u).
$$

The RHS becomes a self-convolution (after making an adjustment for $$N=1$$ to balance it with the LHS)

$$
\sum\_{N\ge1} \left(\sum\_{j=0}^{N-1} Q\_j(u) Q\_{N-1-j}(u)\right)z^{N-1}= Q(z,u)^2 + u - 1.
$$

We arrive at the following ODE

$$
Q\_z(z,u)=Q(z,u)^2 + u - 1,
$$

with initial condition $$Q(0,u) = 1.$$ From here, we virtually follow the same steps as in [Exercise 3.67](/sh-aofa/discrete-mathematical-methods/chapter-3-generating-functions.md#solution-of-the-exercise). The variance is

$$
\boxed{\sigma\_N^2 =\frac{2(N+1)}{45} \quad \text{ for $N \ge 4$}}.
$$

## Exercise 6.45 🌟

{% hint style="success" %}
This exercise proves relationships analogous to those in Theorem 6.7 for BGFs.
{% endhint %}

We assume that $$e(t)=f(|t|)=e\_N$$, which is usually the case with almost all toll functions. Let's define a *toll* BGF

$$
E(z,u) = \sum\_{N\ge0} u^{e\_N}z^N.
$$

The BGF version of Theorem 6.7 uses the [Hadamard product](https://en.wikipedia.org/wiki/Generating_function_transformation#Hadamard_products_and_diagonal_generating_functions) of GFs for "injecting" the factor $$u^{e\_N}$$ into the cost BGF. This is traditionally hard to accomplish using only basic tools.

$$
\begin{align\*}
C\_T(z,u) &= 1 + E(z,u) \odot\_z \left(z C\_T(z,u)^2 \right) && \text{(binary Catalan trees)} \\
C\_G(z,u) &= E(z,u) \odot\_z \left( \frac{z}{1 - C\_G(z,u)} \right) && \text{(general Catalan trees)} \\
z\frac{\partial}{\partial z} C\_B(z,u) &= E(z,u) \odot\_z \left(  zC\_B(z,u)^2 \right) && \text{(binary search trees)}.
\end{align\*}
$$

*Proof*:

Let $$\mathcal{T}$$ be the set of all binary Catalan trees. We have

$$
\begin{align\*}
C\_T(z,u) &=  \sum\_{t \in \mathcal{T}} z^{|t|}u^{c(t)} \\
&= z^0u^0+ E(z,u) \odot\_z \left(\sum\_{t\_l, t\_r \in \mathcal{T}} z^{|t\_l|+|t\_r|+1} u^{c(t\_l) + c(t\_r)}\right) \\
&= 1 +E(z,u) \odot\_z \left(z\sum\_{t\_l \in \mathcal{T}} z^{|t\_l|} u^{c(t\_l)} \sum\_{t\_r \in \mathcal{T}} z^{|t\_r|} u^{c(t\_r)} \right)\\
&= 1 +E(z,u) \odot\_z \left(zC\_T(z,u)^2\right).
\end{align\*}
$$

Let $$\mathcal{G}$$ be the set of all general Catalan trees. We have

$$
\begin{align\*}
C\_G(z,u) &= \sum\_{t \in \mathcal{G}} z^{|t|}u^{c(t)} \\
&=  E(z,u) \odot\_zu \left(\sum\_{k \ge 0}\sum\_{t\_1,t\_2,\dots,t\_k \in \mathcal{G}} z^{|t\_1|+|t\_2|+ \dots +|t\_k|+1} u^{c(t\_1)+c(t\_2)+ \dots +c(t\_k)} \right)\\
&= E(z,u) \odot\_z \left(\frac{z}{1 - C\_G(z,u)}\right).
\end{align\*}
$$

For BSTs we start with the PGF

$$
Q\_N(u) = \frac{u^{e\_N}}{N} \sum\_{j=0}^{N-1} Q\_j(u) Q\_{N-1-j}(u) \quad \text{ for $N\ge1$},
$$

and continue as in the previous exercise. We don’t need anymore that correction factor for $$N=1$$, since the toll function $$e\_N$$ handles all cases. Finally, instead of multiplying by $$z^{N-1}$$ we must multiply by $$z^N$$ to be able to smoothly apply the toll BGF. This produces that extra $$z$$ factor on both sides.

## Exercise 6.46

We start by deriving the OGF for Fibonacci polynomials

$$
\begin{align\*}
A(u) &= \sum\_{h \ge 0} F\_{h+1}(z) u^h \\
&= F\_1(z) + F\_2(z)u + \sum\_{h \ge 2} (F\_h(z) - z F\_{h-1}(z)) u^h \\
&= 1 + u + u \sum\_{h \ge 2} F\_h(z) u^{h-1} - zu^2 \sum\_{h \ge 2} F\_{h-1}(z) u^{h-2} \\
&= 1 + u + u(A(u) - 1) - zu^2 A(u) \\
&= \boxed{\frac{1}{1 - u + zu^2}}.
\end{align\*}
$$

Now, we expand this back into a series, so that we can extract the coefficient $$F\_{h+1}(z)$$.

$$
\begin{align\*}
A(u) &= \frac{1}{1 - (1 - zu)u} \\
&= \sum\_{k \ge 0} u^k (1 - zu)^k \\
&= \sum\_{k \ge 0} u^k \sum\_{j=0}^k \binom{k}{j} (-zu)^j \\
&= \sum\_{k \ge 0} \sum\_{j=0}^k \binom{k}{j} (-z)^j u^{k+j} \\
&\therefore \boxed{\[u^h] A(u)=F\_{h+1}(z) = \sum\_{j} \binom{h-j}{j} (-z)^j}.
\end{align\*}
$$

## Exercise 6.47

First, we need to setup the equation for applying the [Lagrange inversion theorem](https://aofa.cs.princeton.edu/30gf/) (see the third case as well as the remark in [Exercise 6.39](#exercise-6.39))

$$
\begin{align\*}
G(z) - G^{\[h-2]}(z) &= \sqrt{1-4z} \frac{u^h}{1 - u^h} \\\[0.3cm]
&= \sqrt{1-4z} \sum\_{k \ge 1} u^{kh} \\
&= \sum\_{k \ge 1} \underbrace{\frac{1-u}{1+u} u^{kh}}\_{g(u)}.
\end{align\*}
$$

The last step follows from identities stated in the book (relating $$u$$, $$z$$ and $$G(z)$$), so that we can conclude

$$
z = \frac{u}{(1+u)^2} \quad \text{and} \quad \sqrt{1-4z} = \frac{1-u}{1+u}.
$$

We want to extract the coefficient $$\[z^{N+1}]$$ from the above summand. The Lagrange inversion theorem says that if $$z = f(u)$$, then

$$
\[z^{N+1}] g(A(z)) = \frac{1}{N+1} \[u^N] \left( g'(u) \left(\frac{u}{f(u)}\right)^{N+1} \right).
$$

The first derivate of $$g$$ is

$$
g'(u) = \frac{-2}{(1+u)^2} u^{kh} + \frac{1-u}{1+u} kh u^{kh-1}.
$$

Substituting it back into the theorem's formula, we get

$$
\begin{align\*}
\[z^{N+1}] g(A(z)) &= \frac{1}{N+1} \[u^N] \left( -2u^{kh} (1+u)^{2N} + kh(1-u)u^{kh-1} (1+u)^{2N+1} \right) \\
&= \frac{-2}{N+1} \[u^{N-kh}] (1+u)^{2N} + \frac{kh}{N+1} \[u^{N+1-kh}] (1-u)(1+u)^{2N+1} \\
&= \frac{-2}{N+1} \binom{2N}{N-kh} + \frac{kh}{N+1} \left( \binom{2N+1}{N+1-kh} - \binom{2N+1}{N-kh} \right) \\
&= \frac{-2}{N+1} \binom{2N}{N-kh} + \frac{kh}{N+1} \left( \binom{2N}{N+1-kh} - \binom{2N}{N-1-kh} \right) \\
&= \binom{2N}{N+1-kh} - 2\binom{2N}{N-kh} + \binom{2N}{N-1-kh}.
\end{align\*}
$$

The last step (the formula from the book) is a result of intense algebraic gymnastics; the [absorption identity](https://math.stackexchange.com/a/4335171/753413) is helpful for grouping terms. Using a computer algebra system it’s easy to show that the last two lines are equal. All in all, putting it back into the sum over $$k$$ produces the equation of Theorem 6.9.

{% hint style="info" %}
The background about that last line, as published in the book, is available in Flajolet’s presentation [Analyses of Tree Height](https://algo.inria.fr/flajolet/Publications/Slides/IHP-OCT10-slides.pdf) (slides 6-7). The whole expression is nicely captured in terms of the [second-order central difference.](https://en.wikipedia.org/wiki/Finite_difference#Higher-order_differences) This fact is quintessential in the proof of the corollary (see the next exercise).
{% endhint %}

## Exercise 6.48

We need to rigorously justify the transition from the discrete binomial sum to the continuous $$H(x)$$ integral, specifically by managing the error bounds using Theorem 4.9 and Euler-Maclaurin summation. Of course, the proof sketch provided in the book still remains the main scaffolding, so we fill the gaps here.

To keep things aligned with Theorem 4.9, let’s define the central region as $$1 \le h < \sqrt{2N^{1+\epsilon}}$$ for a small constant $$\epsilon>0$$. Notice that it strictly covers the book's range $$1 \le h \le \sqrt{N}\log N$$ in the proof of the corollary of Theorem 6.9.

As pointed out in the previous exercise, the sum denoting $$G\_N-G\_N^{\[h-1]}$$ is fundamentally adding up discrete curvatures of binomials at various distances (using step size $$\delta=kh$$). Inside the central region, according to Theorem 4.6, we can efficiently employ the normal approximation of the binomial distribution. Thus, If a binomial coefficient acts like a Gaussian curve, then its discrete second difference (our term in the sum) must act like the continuous second derivative of that Gaussian curve. So,

$$
\begin{align\*}
\frac{d}{d\delta} e^{-\delta^2/N} &= \left(\frac{-2\delta}{N}\right) e^{-\delta^2/N} \\
\frac{d^2}{d\delta^2} e^{-\delta^2/N} &= \left( \frac{4\delta^2}{N^2} - \frac{2}{N} \right) e^{-\delta^2/N} \\
&=\frac{1}{N} \left( \frac{4k^2h^2}{N} - 2 \right) e^{-k^2h^2/N}.
\end{align\*}
$$

{% hint style="info" %}
The $$1/N$$ factor gets "annihilated" due to division by $$G\_N$$ when moving into the realm of Catalan sums. Notice that $$H(h/\sqrt N)$$ encompasses the expression we just derived.
{% endhint %}

Because we are swapping discrete binomials with the smooth function $$H(x)$$, we introduce a relative error. Specifically, Theorem 4.9 tells us the approximation is valid up to a factor of

$$
\left(1 + O\left(\frac{1}{N^{1-2\epsilon}}\right)\right).
$$

Let's bound the total error introduced across this entire central region

$$
\Delta\_{\text{central}} = \sum\_{1 \le h \le  \sqrt{2N^{1+\epsilon}}} H\left(\frac{h}{\sqrt{N}}\right) \cdot O\left(N^{-1+2\epsilon}\right)= O(\sqrt{N}) \cdot O\left(N^{-1+2\epsilon}\right) = O\left(N^{-1/2+2\epsilon}\right).
$$

We know from the book, that $$H(x)$$ is bounded and decays (see also the next exercise), hence we can approximate the sum $$\sum H(h/\sqrt{N})$$ by its integral, which scales as $$O(\sqrt{N})$$ (proven below). Picking $$\epsilon<1/4$$ the error turns out to be $$o(1)$$.

Outside of the central region, according to Theorem 4.9, the tail is bounded as $$O(e^{(-2N)^\epsilon})$$. This is exponentially small, meaning the tail sum is entirely negligible. We’ve now rigorously reduced the expected height to

$$
E\[H\_N] = \sum\_{h=1}^\infty H\left(\frac{h}{\sqrt{N}}\right) + o(1)
$$

To transition to the integral, we apply the Euler-Maclaurin formula for a function $$f(h) = H(h/\sqrt{N})$$. We can apply the second form of Theorem 4.3 over the interval $$\[1,M]$$ and let $$M \to \infty$$. The usage of $$M$$ avoids confusion, since $$N$$ represents the number of nodes.

$$
\sum\_{h=1}^M f(h) = \int\_1^M f(h) dh + \frac{1}{2}f(M) + C\_f + R\_0.
$$

Because $$H(x)$$ and all its derivatives decay to zero as $$x \to \infty$$, we can safely take the limit as $$M \to \infty$$. This gives

$$
\sum\_{h=1}^\infty f(h) = \int\_1^\infty f(h) dh + C\_f.
$$

To evaluate the integral, we temporarily reuse the result from the book, so

$$
\begin{align\*}
\int\_1^\infty f(h) dh &= \sqrt{N} \int\_{1/\sqrt{N}}^\infty H(x) dx \\
&= \sqrt{N} \left( \int\_0^\infty H(x) dx - \int\_0^{1/\sqrt{N}} H(x) dx \right) \\
&= \sqrt{\pi N}+ O(1) && \text{($|H(x)|<=1$ for all $x$)}.
\end{align\*}
$$

We must explicitly evaluate the integral $$\int\_0^\infty H(x) dx$$. The trap is that we cannot simply swap the sum and the integral because the sum is not absolutely integrable over $$(0, \infty)$$. A term-by-term integration produces zero for every $$k$$. We should use the [Theta function](https://en.wikipedia.org/wiki/Theta_function) $$\theta\_1(x) = \sum\_{k=1}^\infty e^{-k^2x^2}$$. We have

$$
\begin{align\*}
\frac{d}{dx} \left\[ x \sum\_{k=1}^\infty e^{-k^2x^2} \right] &= \sum\_{k=1}^\infty e^{-k^2x^2} - \sum\_{k=1}^\infty 2k^2x^2 e^{-k^2x^2} \\
-2\frac{d}{dx} \left\[ x \sum\_{k=1}^\infty e^{-k^2x^2} \right] &= -2\sum\_{k=1}^\infty e^{-k^2x^2} + \sum\_{k=1}^\infty 4k^2x^2 e^{-k^2x^2} = H(x) \\
&\therefore \int\_0^\infty H(x) dx = -2 \Big\[ x \theta\_1(x) \Big]*0^\infty=2x \theta\_1(x) \Big|*{x \to 0} = \sqrt{\pi}.
\end{align\*}
$$

The constant associated with the function is

$$
\begin{align\*}
C\_f &= \frac{1}{2}f(1) + \int\_1^\infty ({x} - 1/2) f'(x) dx \\
&\le O(1)+ \int\_1^\infty |f'(x)| dx \\
&= O(1)+|f(\infty)-f(1)| \\
&= O(1).
\end{align\*}
$$

Consequently, the average height is indeed $$\sqrt{\pi N} + O(1)$$.

#### Another Combinatorial Proof

Instead of directly handling the error terms in the original approach, it’s instructive to read the paper [The Average Height of Catalan Trees by Counting Lattice Paths](https://oeis.org/A136439/a136439.pdf) that provides a combinatorial proof better suited for computer scientists. It builds upon the material from the book and cites results from the authors of the book.

{% hint style="info" %}
As a side effect, it also establishes a bijection between Dyck paths (related to gambler’s ruin paths) and general Catalan trees.
{% endhint %}

## Exercise 6.49

<figure><img src="/files/8dBqRXlLBrPwuNS2k4cs" alt=""><figcaption></figcaption></figure>

## Exercise 6.50

Because of the rotation correspondence, it turns out that the stack height of binary Catalan trees is essentially distributed like the height of forests. The height of a forest is the maximum height of its constituent trees. The book already establishes the relationship between the stack height of binary and general Catalan trees.

Note that the relationship is **not** a point-wise equality. For example, a left‑chain binary tree with $$N$$ nodes has $$s(t)=0$$, but the corresponding forest’s height is $$N-1$$. We can also construct an opposite example, by taking a forest of $$k>2$$ trees, where every tree is just a root with a single child. The height of this forest is 1. Nonetheless, the stack height of the corresponding binary tree is $$k-1$$.&#x20;

At any rate, the two quantities are asymptotically equivalent in distribution, as highlighted above.

{% hint style="info" %}
The stack size, as specified, is related to the [Strahler number](https://en.wikipedia.org/wiki/Strahler_number), which measures branching complexity.
{% endhint %}

## Exercise 6.51

We have $$z=f(u)=u/(1-u)$$, thus the first case applies

$$
\begin{align\*}
\[z^n]A(z) &= \frac{1}{n}\[u^{n−1}]\left(\frac{u}{f(u)}\right)^n \\
&= \frac{1}{n}[u^{n−1}](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/1-u)^n \\
&= \frac{1}{n}(-1)^{n-1}n \\
&= (-1)^{n-1} \qquad \text{(for $n \ge 1$)}.
\end{align\*}
$$

## Exercise 6.52

Let $$f(z)=e^z-1$$, so $$f(f^{-1}(z))=e^{f^{-1}(z)}-1=z$$. This implies that $$f^{-1}(z)=\ln(z+1)$$. Its Taylor expansion is

$$
\ln(1+z) = \sum\_{n\ge1} \frac{(-1)^{n-1}}{n} z^n.
$$

Let’s apply the Lagrange inversion theorem

$$
\[z^n] \ln(1+z) = \frac{1}{n} \[u^{n-1}] \left( \frac{u}{f(u)} \right)^n= \frac{1}{n} \[u^{n-1}] \left( \frac{u}{e^u - 1} \right)^n.
$$

Handling the RHS is really messy (see [Exercise 3.36](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/haY46WTvhNwGF6h7VSOn#exercise-3.36)), as we need to raise a convoluted power series to $$n$$th power. Nonetheless, we can put the theorem into a "reverse" regime and get a stunning combinatorial identity

$$
\[u^{n-1}] \left( \frac{u}{e^u - 1} \right)^n = (-1)^{n-1}.
$$

This is akin to factoring GFs to derive new identities (see [Exercise 3.5](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/haY46WTvhNwGF6h7VSOn#exercise-3.5)). Compare also this result with the one from the previous exercise.

## Exercise 6.53

See [Exercise 6.72](#exercise-6.72) for the generalization of this problem.

## Exercise 6.54

See Theorem 6.15 in the book with $$t=4$$.

## Exercise 6.55

This is known as [Cayley’s formula](https://en.wikipedia.org/wiki/Cayley%27s_formula).

## Exercise 6.56

The book [Introduction to Algorithms Fourth Edition](https://mitpress.mit.edu/9780262046305/introduction-to-algorithms/), by CLRS (see Section B.5.1 Free trees) contains a full proof using an extended list (6 instead of 4) of equivalent properties. Furthermore, they apply a clever trick to reduce the combinatorial explosion in the proof. Namely, it’s enough to prove just one cycle between these properties, that one implies another in a cycle, reducing the required work.

## Exercise 6.57

The page about [trees](https://mathworld.wolfram.com/Tree.html) on Wolfram MathWorld lists all different free trees with $$\le 6$$ nodes. The distribution of ordered trees onto these in left-to-right, top-to-bottom order is 5, 10, 10, 10, 2, 5. Figure 6.22 shows that the winner for $$N=5$$ is the tree structure known as the Broom graph $$B\_{5,3}$$. The same type of graph structure $$B\_{6,3}$$ also appears most frequently among all ordered trees on six nodes (it’s the fourth one in the linked post), although not uniquely. Nonetheless, Broom graphs actually perform poorly for large $$N$$ (see the next exercise).

At any rate, even for the cases with 5 and 6 nodes, highly symmetric trees perform badly in this respect. For example, the star graph is always associated with only 2 ordered trees.

## Exercise 6.58

Imagine a central node connected to

* a single leaf,
* a chain of 2 nodes,
* and a chain of 3 nodes.

This free tree has $$N=7$$ nodes in total. It has maximum asymmetry achievable with a tree of this size. If we promote the central node as a root, we immediately produce 3! different ordered trees. If this central node becomes a child of some other root, then it again contributes 2! variants in any context. A Broom graph is no match for this version.&#x20;

The key takeaway is to maximize asymmetry within the structure. An effective approach involves beginning with a central node accompanied by subtrees that are structurally distinct, applying this principle recursively throughout each subtree. It’s essential to balance this objective against the number of available nodes. While positioning a high-degree node as the root can generate numerous ordered trees, it is also important to avoid identical subtrees, as these diminish the overall count.

## Exercise 6.59

Assume *tree* denotes an ordered tree, based on the following quote from the book:

> But we maintain our algorithmic bias, by reserving the word *tree* for ordered trees, which arise in perhaps the most natural way in computer applications.

The description below is a linear time algorithm, as a function of number of edges.&#x20;

{% stepper %}
{% step %}

### Building a Tree Representation

Read an input consisting of a set of edges and produce an [adjacency list](https://www.khanacademy.org/computing/computer-science/algorithms/graph-representation/a/representing-graphs) representation of the tree. Find the root (the node with in-degree 0 assuming edges are directed). Because an adjacency list has no strict left-to-right order, the "ordered" nature of the resulting tree will depend on the order edges are read. The list data structure preserves the order of how items were added to it.

{% hint style="info" %}
If the input edges are undirected (just pairs of connected nodes without a strict parent-child relationship), we cannot rely on in-degrees. In that scenario, we would pick an arbitrary node to be the root (or select it in some other way), build the tree outwards using a Breadth-First or Depth-First Search to establish parent-child directions, and then proceed.
{% endhint %}
{% endstep %}

{% step %}

### Traversing the Tree

Do a preorder traversal of the tree as described in Section 6.3 pertaining to parenthesis systems.
{% endstep %}
{% endstepper %}

## Exercise 6.60

The first step is the same as in the previous exercise. In the next stage, we must explicitly construct the binary tree representation in memory by wiring up the left and right pointers for every node (assume that nodes are created and we’ve the required mapping from identifiers to nodes ready). This can be done as follows (starting from the root):

* For every node $$u$$ in the tree:
  * If $$u$$ has children in its adjacency list, assign the first child as $$u$$'s left child.
  * Iterate through the remaining children in $$u$$'s adjacency list sequentially. Link them together using their right child pointers (forming a linked list of siblings).

## Exercise 6.61

For binary trees there are various streamlined approaches. The LeetCode problem [Flip Equivalent BInary Trees](https://leetcode.com/problems/flip-equivalent-binary-trees/) (the editorial is publicly available) covers this exercise. The algorithm below illustrates the idea behind the generic [AHU-algorithm](https://arxiv.org/pdf/2401.07636). It differs from the third approach of LeetCode’s editorial by preserving the original trees.

{% code expandable="true" %}

```python
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def flipEquiv(self, root1: Optional[TreeNode], root2: Optional[TreeNode]) -> bool:
        return self._create_labels(root1) == self._create_labels(root2)

    def _create_labels(self, root: TreeNode) -> str:
        if root is None:
            return "()"

        # Perform postorder traversal to create labels of children.
        llabel = self._create_labels(root.left)
        rlabel = self._create_labels(root.right)

        # If no left child, then just return the label of the right child.
        if root.left is None:
            return f"({root.val}{rlabel})"
        # If no right child, then just return the label of the left child.
        if root.right is None:
            return f"({root.val}{llabel})"
        
        # Swap labels of children based on sorted order of node values.
        if root.left.val > root.right.val:
            llabel, rlabel = rlabel, llabel
        return f"({root.val}{llabel}{rlabel})"
```

{% endcode %}

{% hint style="info" %}
The Python3 code is based on the pseudo-code of the AHU-algorithm. To keep things simple, the step to condense labels is skipped, which is important to retain linear time performance. Finally, in this LeetCode problem, nodes already have distinct values. In general, you would need to assign IDs based on unique subtree structures (see the cited paper).
{% endhint %}

## Exercise 6.62

First, build two trees based on their representations. For each of them repeat the next steps:

1. Initialize an empty stack.
2. Iterate through the string:
   1. When you see a (, create a new node, add it as a child to the node at the top of the stack (if nonempty), and push this new node onto the stack.
   2. When you see a ), pop the top node off the stack.
3. Return the last node popped from the stack.

Now, compare these trees using the AHU-algorithm (see the previous exercise).

## Exercise 6.63

The book suggests building the series step-by-step by iterating the functional equation

$$
U^{\[h+1]}(z) = z \exp\left{ \sum\_{k=1}^{h+1} \frac{1}{k} U^{\[h]}(z^k) \right} \qquad \text{for $h \ge 0$},
$$

with $$U^{\[0]}(z)=z$$. Because both series agree to $$h + 1$$ terms, we can just treat everything as bounded polynomials and update them layer by layer. The next Python program implements the naive version without using any external libraries. Its running time is $$O(N^4)$$ arithmetic operations. It’s intentionally kept in this form to truly depict what is going on under the hood. A simple optimization, elaborated at the beginning of the next exercise, may bring down the running time to $$O(N^3)$$ even for this basic approach.

{% code expandable="true" %}

```python
from fractions import Fraction

def poly_add(p, q, max_deg):
    """Adds two polynomials up to max_deg degrees."""
    res = [Fraction(0)] * (max_deg + 1)
    for i in range(max_deg + 1):
        p_val = p[i] if i < len(p) else Fraction(0)
        q_val = q[i] if i < len(q) else Fraction(0)
        res[i] = p_val + q_val
    return res

def poly_subs(p, k, max_deg):
    """Computes p(z^k) by spacing out the coefficients up to max_deg degrees."""
    res = [Fraction(0)] * (max_deg + 1)
    for i in range(len(p)):
        if i * k <= max_deg:
            res[i * k] = p[i]
    return res

def poly_mul(p, q, max_deg):
    """Naively multiplies two polynomials up to max_deg degrees."""
    res = [Fraction(0)] * (max_deg + 1)
    for i in range(len(p)):
        for j in range(len(q)):
            if i + j <= max_deg:
                res[i + j] += p[i] * q[j]
    return res

def poly_exp(p, max_deg):
    """Computes exp(p(z)) using Taylor series: 1 + p + p^2/2! + p^3/3! ..."""
    res = [Fraction(0)] * (max_deg + 1)
    res[0] = Fraction(1)

    # This brute-force approach computes each term from scratch,
    # which is inefficient but straightforward.
    term = [Fraction(0)] * (max_deg + 1)
    term[0] = Fraction(1)
    for k in range(1, max_deg + 1):
        term = poly_mul(term, p, max_deg)
        term = poly_mul(term, [Fraction(1, k)], max_deg)
        res = poly_add(res, term, max_deg)
    return res

def count_unordered_trees(n):
    u = [Fraction(0), Fraction(1)]

    for h in range(1, n + 1):
        max_deg = h + 1

        # Compute inner sum p(z) = sum(1/k * U(z^k))
        p = [Fraction(0)] * (max_deg + 1)
        for k in range(1, max_deg + 1):
            term = poly_subs(u, k, max_deg)
            term = poly_mul(term, [Fraction(1, k)], max_deg)
            p = poly_add(p, term, max_deg)

        # Compute e(z) = exp(p(z))
        e = poly_exp(p, max_deg)

        # Multiply by z (shift right by 1)
        u[1:] = e[:max_deg + 1]

        print(f"U_{h} = {int(u[max_deg - 1])}")

count_unordered_trees(100)
```

{% endcode %}

Python 3 by default handles arbitrarily large integers, so any number of coefficients may be printed. Of course, the bad performance characteristics of this version precludes large $$N$$, although it can be used to efficiently dump the first 100 terms.

{% hint style="info" %}
There are lots of relatively novel implementations in various programming languages to compute $$U\_N$$. These are documented in OEIS [A000081](https://oeis.org/A000081). Among them there is a one liner program in SageMath.
{% endhint %}

## Exercise 6.64

The underlying idea of this optimized version is best understood by looking at a more mundane task of speeding up the `poly_exp` function (see the previous exercise). It evaluates the Taylor expansion of $$e^x$$ by computing each term from scratch. Overall this entails $$O(N^3)$$ operations. Let’s see how to reduce this to $$O(N^2)$$. Let $$E(z) = \exp(P(z))$$. Taking the derivative of both sides, we get a differential equation

$$
E'(z) = E(z) P'(z).
$$

Equating the coefficients of $$z^{N-1}$$ gives

$$
N E\_N = \sum\_{k=1}^N k P\_k E\_{N-k}.
$$

Thus, computing each coefficient demands only $$O(N)$$ instead of $$O(N^2)$$ arithmetic operations. The Harary-Palmer method differentiates the entire functional equation to get everything down to $$O(N^2)$$. This gives

$$
\begin{align\*}
U(z) &= z \exp\left{ \sum\_{k\ge1} \frac{1}{k} U(z^k) \right} \\
\ln U(z) &= \ln z + \sum\_{k\ge1} \frac{1}{k} U(z^k) \\
\frac{U'(z)}{U(z)} &= \frac{1}{z} + \sum\_{k\ge1} \frac{1}{k} U'(z^k) \cdot k z^{k-1} \\
&= \frac{1}{z} + \sum\_{k\ge1} U'(z^k) z^{k-1} \\
z U'(z) &= U(z) + U(z) \sum\_{k\ge1} z^k U'(z^k) \\
\sum\_{N\ge1} N U\_N z^N &= \sum\_{N\ge1} U\_N z^N + \left( \sum\_{i\ge1} U\_i z^i \right) \left( \sum\_{k\ge1} \sum\_{l\ge1} l U\_l z^{kl} \right).
\end{align\*}
$$

We want to extract the coefficient of $$z^{N+1}$$. On the RHS we must have $$i+kl=N+1$$. Because $$i \ge 1$$ this implies that $$kl\le N$$. Furthermore, $$l \ge 1$$ means that $$1 \le k \le N$$. We get

$$
\begin{align\*}
(N+1) U\_{N+1} &= U\_{N+1} + \sum\_{kl \le N} k U\_k U\_{N+1-kl} \\
N U\_{N+1} &= \sum\_{kl \le N} k U\_k U\_{N+1-kl} \\
&\boxed{N U\_{N+1} = \sum\_{1 \le k \le N} \left( k U\_k \sum\_{k \le kl \le N} U\_{N+1-kl} \right)} \\
&\therefore U\_{N+1} = \frac{1}{N} \sum\_{1 \le k \le N} U\_{N+1-k} \left( \sum\_{d|k} d U\_d \right).
\end{align\*}
$$

The final equation is the form used in the efficient Python implementation shown below. Using memoization we can achieve the $$O(N^2)$$ objective. Notice that handling the inner divisor sum requires $$O(N \log N)$$ operations during the whole run.

{% code expandable="true" %}

```python
from functools import lru_cache
from sympy import divisors

@lru_cache(maxsize=None)
def divisor_sum(k):
    return sum(d * A000081(d) for d in divisors(k, generator=True))

@lru_cache(maxsize=None)
def A000081(n):
    if n <= 1:
        return n
    convolution = sum(divisor_sum(k) * A000081(n - k) for k in range(1, n))
    return convolution // (n - 1)

if __name__ == "__main__":
    for n in range(1, 101):
        print(f"U_{n}={A000081(n)}")
```

{% endcode %}

{% hint style="warning" %}
At the time of this writing, the Python program published at OEIS [A000081](https://oeis.org/A000081) contains a bug. Namely, it needlessly recomputes the inner divisor sum. The code above fixes this issue.
{% endhint %}

## Exercise 6.65

See [DigraphGenerator.java](https://algs4.cs.princeton.edu/42digraph/DigraphGenerator.java.html).

## Exercise 6.66

Each free tree gives at least one rooted tree (by picking any node as root), so there are at least as many rooted trees of $$N$$ nodes as free trees of $$N$$ nodes. There are $$N$$ choices for picking a root, so there can be at most $$N$$ times as many rooted trees of $$N$$ nodes as free trees of $$N$$ nodes.

## Exercise 6.67

The derivation is based on a combinatorial argument expressed in terms of OGFs. The starting point is [Otter’s theorem](https://pubs.ams.org/journals/proc/1954-005-01/S0002-9939-1954-0059535-8/S0002-9939-1954-0059535-8.pdf) *"In any tree the number of nonsimilar vertices minus the number of nonsimilar lines (symmetry line excepted) is the number one."* Therefore, the formula for any free tree is

$$
1 = (\text{distinct node roots}) - (\text{distinct edge roots except symmetric edges}).
$$

{% hint style="info" %}
Rooting a free tree at an edge (edge root) simply means selecting a specific edge to act as the "center." Structurally, this represents an unordered pair of rooted subtrees connected by that edge.
{% endhint %}

If we sum this property over all free trees (grouped by size), the LHS transforms immediately into $$F(z)$$. On the RHS, the first term is simply $$U(z)$$, by definition of rooted trees.

Every undirected edge contains two rooted subtrees on its endpoints. The OGF for a sequence of 2 rooted trees is $$U(z)^2$$. Nonetheless, this also includes the symmetric case, when both subtrees are the same, so in aggregate it implies an even sized rooted tree, hence its OGF is $$U(z^2).$$ We must add this amount back. Furthermore, we don’t care about order, so these quantities must be divided by two.

Taking into account the previous facts, we end up with the equation

$$
F(z) = U(z) - \frac{1}{2}U(z)^2 + \frac{1}{2}U(z^2).
$$

## Exercise 6.68 🌟

{% hint style="success" %}
This exercise introduces the [method of matched asymptotic expansions](https://en.wikipedia.org/wiki/Method_of_matched_asymptotic_expansions), which allows working with unspecified artificial boundaries that will vanish from the final expression. Such boundaries are necessary to handle different regions of the sum appropriately. It utilizes the philosophy of the Laplace method, but adapted for convolutions where the dominant contributions lie in the extreme tails rather than a central peak.
{% endhint %}

We want to find

$$
F\_N = \[z^N] \left( U(z) - \frac{1}{2}U(z)^2 + \frac{1}{2}U(z^2) \right).
$$

We know from the book that $$\[z^N] U(z) = U\_N \sim c \alpha^N N^{-3/2}$$. The symmetric term $$\[z^N] U(z^2) \sim U\_{N/2}$$ is exponentially small relative to $$U\_N$$ (see [Exercise 4.9](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/6pjaXyRo8WpOT0GFIIZ1#exercise-4.9)). We are left with estimating the discrete convolution

$$
F\_N \approx U\_N - \frac{1}{2} \sum\_{k=1}^{N-1} U\_k U\_{N-k}.
$$

Let’s partition the sum into three distinct regions using an unspecified intermediate boundary $$A$$ (such as $$A = \sqrt{N}$$), where $$1 \ll A \ll N$$. We differentiate the following regions:

* **The Left Tail**: $$k \in \[1, A]$$
* **The Right Tail**: $$k \in \[N-A, N-1]$$
* **The Central Region**: $$k \in (A, N-A)$$

Due to symmetry, the left and right tails are the same.

#### The Left Tail and the Residual

In the left tail, $$k$$ is small relative to $$N$$. We use the Taylor expansion to approximate the $$(N-k)^{-3/2}$$ term in $$U\_{N-k}$$

$$
(N-k)^{-3/2} = N^{-3/2}\left(1 - \frac{k}{N}\right)^{-3/2} = N^{-3/2}\left(1 + \frac{3k}{2N}+O((k/N)^2)\right).
$$

Substituting this into the left tail of the convolution (ignoring the quadratic O-term)

$$
\text{Left Tail} \sim \sum\_{k=1}^{A} U\_k \left( c \alpha^{N-k} N^{-3/2} \left(1 + \frac{3k}{2N}\right) \right) = U\_N \underbrace{\sum\_{k=1}^{A} U\_k \alpha^{-k}}*{U(1/\alpha) \text{ for } A \to\infty} + \frac{3 U\_N}{2N} \sum*{k=1}^{A} k U\_k \alpha^{-k}.
$$

To evaluate the first sum, we cannot just simply plug in the asymptotic expression for $$U\_k$$, since $$k$$ starts at 1 and the sum converges. For very small $$k$$ we may introduce huge discrepancies by doing so. Let’s peek a little bit into singularity analysis in the book *Analytic Combinatorics* by PF & RS. The following lemma is useful:

**Lemma**: Let $$\rho = 1/\alpha$$ be the radius of convergence of $$U(z)$$. Then, $$U(\rho) = 1$$.

On a high level, the statement about $$\rho$$ follows by matching the exponential growth rate $$\rho^{-N}$$ from the corollary of Theorem 5.5 with $$\alpha^N$$ from Theorem 6.12. This gives

$$
\sum\_{k=1}^{A} U\_k \alpha^{-k} = 1 - \int\_{A}^{\infty} c x^{-3/2} , dx = 1 - 2c A^{-1/2}.
$$

The second summand is different. It's a divergent series, so even though the replacement of $$U\_k$$ with its asymptotic estimate is inaccurate for small $$k$$, due to divergence, the finitely many initial terms of the sum are negligible. Once $$k$$ ramps up, it dictates the overall result. We’ll borrow the solution from [Exercise 4.58](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/pages/6pjaXyRo8WpOT0GFIIZ1#exercise-4.58).

Thus, the left tail evaluates to

$$
\text{Left Tail} \approx U\_N - \frac{2c U\_N}{\sqrt{A}} + \frac{3 cU\_N}{2N} \left( 2\sqrt{A} + \zeta(1/2) \right).
$$

#### The Central Region

In the central region, both $$k$$ and $$N-k$$ are large, allowing us to safely replace both with their continuous asymptotic equivalents and approximate the sum as an integral

$$
\begin{align\*}
\text{Center} &\approx \int\_{A}^{N-A} \left( c \alpha^x x^{-3/2} \right) \left( c \alpha^{N-x} (N-x)^{-3/2} \right) , dx \\\[0.4cm]
&= c^2 \alpha^N \int\_{A}^{N-A} x^{-3/2}(N-x)^{-3/2} , dx \\\[0.4cm]
&= c^2 \alpha^N \frac{4(2(N-A)-N)}{N^2 \sqrt{(N-A)A}} && \text{($F(A)=-F(N-A)$)}\\\[0.4cm]
&= c^2 \alpha^N \frac{4(N-2A)}{N^2 \sqrt{A} \sqrt{N-A}}  \\\[0.4cm]
&= c^2 \alpha^N \frac{4N(1 - 2A/N)}{N^2 \sqrt{A} \sqrt{N} \sqrt{1 - A/N}} \\\[0.4cm]
&= c^2 \alpha^N \frac{4(1 - 2A/N)}{N^{3/2} \sqrt{A}} \left(1 - \frac{A}{N}\right)^{-1/2} \\\[0.4cm]
&\approx c^2 \alpha^N \frac{4}{N^{3/2} \sqrt{A}} \left(1 - \frac{2A}{N}\right) \left(1 + \frac{A}{2N}\right) && \text{($(1 - A/N)^{-1/2} \approx 1 + \frac{A}{2N}$)} \\\[0.4cm]
&\approx c^2 \alpha^N \frac{4}{N^{3/2} \sqrt{A}} \left(1 - \frac{3A}{2N}\right) && \text{(dropping quadratic term)} \\\[0.4cm]
&= \frac{4 c U\_N}{\sqrt{A}} - \frac{6cU\_N\sqrt{A}}{N}.
\end{align\*}
$$

#### Combined Result of the Convolution

We now reassemble the total convolution by summing the three regions

$$
\begin{align\*}
\sum U\_k U\_{N-k} &= \text{Left Tail} + \text{Right Tail} + \text{Center} \\
&= 2\left( U\_N - \frac{2c U\_N}{\sqrt{A}} + \frac{3 cU\_N}{2N} \left( 2\sqrt{A} +  \zeta(1/2) \right)\right) + \left( \frac{4c U\_N}{\sqrt{A}} - \frac{6cU\_N\sqrt{A}}{N} \right) \\
&= 2U\_N+O(U\_N/N).
\end{align\*}
$$

#### Conclusion

Substituting back the result of the convolution, we’ve

$$
F\_N \approx U\_N - \frac{1}{2} ( 2 U\_N + O(U\_N/N)) \sim c\_1\alpha^NN^{-5/2}.
$$

## Exercise 6.69

The number of distinct labelings of an unlabeled tree $$T$$ with $$N$$ nodes is $$N! / |\text{Aut}(T)|$$, where $$\text{Aut}(T)$$ is the automorphism group of $$T$$. The automorphism group $$\text{Aut}(T)$$ of a tree $$T$$ is the set of all permutations of the nodes of $$T$$ that preserve the adjacency relation. In other words, a permutation $$\sigma$$ of the node set is an automorphism if for any two nodes $$u, v$$, the pair $${u, v}$$ is an edge in $$T$$ if and only if $${\sigma(u), \sigma(v)}$$ is an edge in $$T$$. To maximize the number of labelings, we minimize the size of the automorphism group.&#x20;

For $$N=4,5,6$$, the path (automorphism group of order 2) has the most labelings (only a sole winner when $$N=4$$). For $$N \ge 7$$, we can always create an asymmetric tree with a trivial automorphism group producing $$N!$$ labelings.

A concrete example with $$N=7$$ nodes is&#x20;

<pre data-expandable="true"><code><strong>a   
</strong><strong>|
</strong>b - c - d - e
|
f
|
g
</code></pre>

Notice that every configuration is unique. One possible general structure is to have a central node with branches of different lengths.

## Exercise 6.70

Wikipedia has a detailed [solution](https://en.wikipedia.org/wiki/Cayley%27s_formula#Other_properties) as part of the Cayley’s formula.

## Exercise 6.71

The equation follows from the arguments given in [Exercise 6.67](#exercise-6.67) with the symmetrical case omitted, as it doesn’t apply here due to labels.

## Exercise 6.72

The OGF for $$k$$-forests is just $$T^{\[t]}(z)^k$$, where $$T^{\[t]}(z)$$ is the OGF for $$t$$-ary trees. We know from the book that $$z=u(1-u^{t-1})$$. By Lagrange inversion (using the second case in Theorem 6.11), the number of $$k$$-forests of $$t$$-ary trees with a total of $$N$$ internal nodes and $$(t-1)N+k$$ external nodes (each tree also contributes +1 to the total) is

$$
\begin{align\*}
\[z^{(t-1)N+k}]T^{\[t]}(z)^k &= \frac{k}{(t-1)N+k}[u^{(t-1)N}](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/1-u^{t-1})^{-((t-1)N+k)} \\
&= \frac{k}{(t-1)N+k}[u^{N}](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/1-u)^{-((t-1)N+k)} \\
&= \frac{k}{(t-1)N+k} \binom{tN+k-1}{N} \\
&= \frac{k}{tN+k} \binom{tN+k}{N}.
\end{align\*}
$$

## Exercise 6.73

We start with the exact formula from the book

$$
T\_N = \frac{1}{(t-1)N+1} \binom{tN}{N} = \frac{1}{(t-1)N+1} \frac{(tN)!}{N! ((t-1)N)!}.
$$

For large $$N$$, we can approximate the prefactor as

$$
\frac{1}{(t-1)N+1} \sim \frac{1}{(t-1)N}.
$$

Now, we apply Stirling's approximation to the three factorials in the binomial coefficient

$$
\begin{align\*}
\binom{tN}{N} &\sim \frac{\sqrt{2\pi t N} \cdot t^{tN} \cdot N^{tN} \cdot e^{-tN}}{\sqrt{2\pi N} \cdot N^N \cdot e^{-N} \cdot \sqrt{2\pi (t-1)N} \cdot (t-1)^{(t-1)N} \cdot N^{(t-1)N} \cdot e^{-(t-1)N}} \\
&= \frac{\sqrt{2\pi t N}}{\sqrt{2\pi N} \sqrt{2\pi (t-1)N}} \cdot \frac{t^{tN}}{(t-1)^{(t-1)N}} \\
&= \frac{\sqrt{2\pi t N}}{\sqrt{2\pi N} \sqrt{2\pi (t-1)N}} \cdot \left( \frac{t^t}{(t-1)^{t-1}} \right)^N \\
&= \frac{1}{\sqrt{2\pi N}} \sqrt{\frac{t}{t-1}} (\alpha\_t)^N.
\end{align\*}
$$

This gives

$$
\begin{align\*}
T\_N &\sim \left( \frac{1}{(t-1)N} \right) \cdot \left( \frac{1}{\sqrt{2\pi N}} \sqrt{\frac{t}{t-1}} (\alpha\_t)^N \right) \\
&= \frac{1}{N^{3/2}} \cdot \frac{1}{(t-1)\sqrt{2\pi}} \sqrt{\frac{t}{t-1}} \cdot (\alpha\_t)^N \\
&= \frac{1}{\sqrt{2\pi \frac{(t-1)^3}{t}}} \cdot \frac{(\alpha\_t)^N}{N^{3/2}} \\
&= c\_t \frac{(\alpha\_t)^N}{N^{3/2}}.
\end{align\*}
$$

## Exercise 6.74

We start with the formula for the number of $$t$$-restricted trees from the book

$$
\begin{align\*}
\[z^N]T(z) &= \frac{1}{N}[u^{N-1}](https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/\theta\(u\))^N \\
&= \frac{1}{N}\[u^{N-1}]\left(1-u^{t+1}\right)^N (1-u)^{-N} \\
&= \frac{1}{N}\[u^{N-1}]\left(\sum\_{k \ge0} \binom{N}{k} (-1)^k u^{k(t+1)}\right)\left(\sum\_{j\ge0} \binom{N+j-1}{N-1} u^j\right).
\end{align\*}
$$

We want to extract the coefficient $$\[u^{N-1}]$$, so only terms satisfying the following equations are relevant

$$
k(t+1) + j = N - 1 \implies j = N - 1 - k(t+1), \\
j \ge 0 \implies  k \le \frac{N-1}{t+1}.
$$

This gives

$$
\[z^N]T(z) = \frac{1}{N} \sum\_{k=0}^{\lfloor \frac{N-1}{t+1} \rfloor} (-1)^k \binom{N}{k} \binom{2N - 2 - k(t+1)}{N-1}.
$$

## Exercise 6.75

The Python 3 program below implements in code the formula derived in the previous exercise. The maximum integer is simulated to be a 64-bit unsigned integer in most programming languages.

{% code expandable="true" %}

```python
import math

def compute_t_restricted_trees(t, int_max=(2 ** 63 - 1)):
    n = 1
    while True:
        total = 0
        for k in range((n - 1) // (t + 1) + 1):
            sign = (-1) ** k
            binom1 = math.comb(n, k)
            binom2 = math.comb(2 * n - 2 - k * (t + 1), n - 1)
            total += sign * binom1 * binom2

        result = total // n
        print(result, end=",")

        if result > int_max:
            break
        n += 1

if __name__ == "__main__":
    compute_t_restricted_trees(t=2)
```

{% endcode %}

{% hint style="info" %}
The default configuration prints the Motzkin numbers on the output. They match the OEIS [A001006](https://oeis.org/A001006) sequence.
{% endhint %}

## Exercise 6.76

The customized Theorem 6.16 becomes

$$
\theta\_{\text{even}}(u)=1+u^2+u^4+\dots+u^{2r}= \dfrac{1-u^{2(r+1)}}{1-u^2}.
$$

Because the sum of children counts (i.e. the total number of edges) is even, the total number of nodes $$N = \text{edges}+1=2k+1$$ must be odd. The Lagrange inversion formula gives

$$
\[z^N]T\_{even}(z)=\frac{1}{2k+1},\[u^{2k}] \bigl(\theta\_{\text{even}}(u)\bigr)^{2k+1}.
$$

Let’s use the substitution $$v=u^2$$, as done in the book and [Exercise 6.74](#exercise-6.74), so

$$
\begin{align\*}
\bigl(\theta\_{\text{even}}(u)\bigr)^{2k+1} &= (1-v)^{-(2k+1)} (1-v^{r+1})^{2k+1} \\
&= \sum\_{i\ge0} (-1)^i \binom{2k+1}{i} v^{i(r+1)} \sum\_{j\ge 0} \binom{2k+j}{j} v^j.
\end{align\*}
$$

We want to extract the coefficient $$\[u^{2k}]=\[v^k]$$, so only terms satisfying the following equations are relevant

$$
i(r+1) +j=k \implies j =k- i(r+1), \\
j \ge 0 \implies  i \le \frac{k}{r+1}.
$$

This gives

$$
\begin{align\*}
\[z^N]T\_{even}(z) &= \frac{1}{2k+1} \sum\_{i=0}^{\lfloor k/(r+1)\rfloor} (-1)^i \binom{2k+1}{i} \binom{3k - i(r+1)}{2k} \\
&= \boxed{\frac{1}{2k+1} \sum\_{i=0}^{\lfloor k / \lfloor (t+1)/2 \rfloor \rfloor} (-1)^i \binom{2k+1}{i} \binom{3k - i\lfloor (t+1)/2 \rfloor}{2k}}.
\end{align\*}
$$

## Exercise 6.77

Balanced trees (AVL and 2-3) have definitely a smaller cardinality than unbalanced trees (3-ary and 3-restricted).&#x20;

AVL has only a local height constraint whilst 2-3 has a global condition (all external nodes must be at the same height). Nonetheless, AVL is a binary tree, which turns out to be a more limiting factor.

The 3-ary trees have more possibilities, since "empty" links do contribute to the combinatorial count.

Therefore the list of trees in increasing order of their cardinality for large $$N$$: AVL, 2-3, 3-restricted and 3-ary.

## Exercise 6.78

Assume $$N$$ denotes the number of internal nodes.

{% code expandable="true" %}

```
N  | AVL Trees | 2-3 Trees
--------------------------
1        1          2
2        1          0
3        1          3
4        1          4
5        2          0
6        1          0
7        2          6
8        3          12
9        4          10
10       4          10
11       5          24
12       4          30
13       8          20
14       12         0
```

{% endcode %}

Both trees show oscillations, especially the 2-3 variant. Some cases aren’t even possible. This shows how global constraints (2-3 depth) versus local constraints (AVL balance) completely rewrite the asymptotic behavior of the generating functions.

## Exercise 6.79 🌟

{% hint style="success" %}
This exercise introduces the [substitution operator](https://en.wikipedia.org/wiki/Symbolic_method_\(combinatorics\)#Other_elementary_constructions) of the symbolic method.
{% endhint %}

We can apply the symbolic method in both cases. Notice the usage of the [Iverson bracket](https://en.wikipedia.org/wiki/Iverson_bracket) operator.

An AVL tree of height $$h$$ is either empty or has the root attached to perfectly balanced subtrees of height $$h-1$$ or attached to subtrees with heights $$h-1$$ and $$h-2$$ (either left or right child is taller).

$$
A\_h=\[h \le 0]\epsilon+\[h > 0]Z \times (2A\_{h-1} \times A\_{h-2} + A\_{h-1}^2).
$$

To specify the functional equation on the generating function of 2-3 trees, we must use substitution operator. Based on the definition from the book, each internal node is either a 2 node (doubles the number of external nodes) or 3 node (triples the number of external nodes). Furthermore, all external nodes must be at the same level. This gives

$$
B\_h=\[h=0]Z+\[h>0]B\_{h-1} \circ (Z^2+Z^3).
$$

This is a bottom-up approach of building a tree. To build a tree of height $$h>0$$ from a tree of height $$h-1$$, we take every existing leaf $$\mathcal{Z}$$ and replace it with either a 2-node (which sprouts 2 new leaves, $$\mathcal{Z}^2$$) or a 3-node (which sprouts 3 new leaves, $$\mathcal{Z}^3$$). Now, taking everything together, we get various multiplicative factors at play producing different numbers of external nodes, hence making trees of various structures and sizes.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://evarga.gitbook.io/sh-aofa/algorithms-and-combinatorial-structures/chapter-6-trees.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
