Chapter 6
Exercise 6.1
The inductive hypothesis is the statement of the lemma. The base case is a tree with one external node and the lemma certainly holds.
For the inductive step, assume a tree has i>0 internal nodes. We can take the left and right subtrees of the root, both of them having less than i internal nodes. Let their numbers be iL and iR, respectively. For these subtrees, the inductive hypothesis holds, so eL=iL+1 and eR=iR+1. Furthermore, eL+eR=iL+1+iR+1=i−1+2=i+1 as expected, since iL+iR=i−1 (the root is an internal node) and eL+eR=e. This concludes the inductive proof.
Exercise 6.2
The total number of binary trees with N nodes is given by the Nth Catalan number TN. We need to find the number of binary trees with N internal nodes having both subtrees of the root nonempty. The ratio of these numbers gives the answer to this exercise. Thus,
For trees with less than 3 internal nodes this ratio is zero.
Exercise 6.3
The answer is TN2/T2N+1, where TN is the Nth Catalan number.
Exercise 6.4
Obviously, it is
See the next exercise for a generalization of this problem.
Exercise 6.5 🌟
This exercise showcases the power of the symbolic method to answer seemingly very difficult questions.
For answering this question, we will switch to the symbolic method. We already know the symbolic construction for a general tree from the proof of Theorem 6.2. We want to find the trees where the root has exactly t children. Symbolically, a tree in this class is a root node attached to a sequence of exactly length t of G
We want the number of such trees of size N
The RHS immediately follows from the Catalan k-fold convolution. The asked proportion is
As a quick check, when t=1 we get exactly the formula from the previous exercise.
Exercise 6.6 🌟
This exercise shows how to use Maple’s combstruct package for the symbolic method. It comes with other packages to support asymptotic estimation of coefficients.
To count the number of forests without singletons, we should alter a bit the combinatorial construction for the forest
The ordinary forest is a sequence of trees, but our restricted definition only allows trees with at least two nodes. Using the corresponding transfer theorems, we have
If we extract the coefficient of zN for N≥1, we get a recurrence relation
We can solve this first-order linear recurrence using the generalized technique presented in Exercise 2.14. This gives our proportion (after dividing by FN=TN)
It prints 0, 1, 2, 6. Notice that 6/14=3/7, as expected.
Exercise 6.7
We first group forests by size (the order of list items designate N) and then enumerate each group in top-down left-right fashion. This gives
()()(),(())()()(),(())(),()(()),(()()),((()))()()()(),(())()(),()(())(),()()(()),((()))(),(()())(),(())(()),()(()()),()((())),(()()()),((())()),((()())),(()(())),(((())))
Exercise 6.8
A notable distinction between ordered trees and binary trees is that a binary tree may be empty. Accordingly, an explicit notation should be used to indicate an empty binary tree. The diagram below provides a reference to highlight the differences in their respective representations.
Preorder traversal of a tree maps directly to a preorder traversal of a binary tree. Postorder traversal of a tree corresponds to an inorder (not postorder) traversal of a binary tree. There is no notion of an inorder traversal of a tree, since it’s not defined when to visit the root among multiple children.
Parenthesis Systems Representation
The preorder representation of a tree is
(1 (2 (5) (6) ) (3) (4) )
The preorder representation of a binary tree is
(1 (2 (5) (6) ) (3) (4) )
In an arbitrary binary tree, it is necessary to append an additional closing parenthesis at the end to indicate the last external node. Since a binary tree is a positional structure, each external node (empty link) is denoted by a closing parenthesis. Consider node 5, which has node 6 as its right child; during traversal of 5's children, a closing parenthesis is placed for the left child and an opening parenthesis for the right child. However, the root of a binary tree produced through rotation correspondence from an ordered tree doesn’t possess a right child in this context.
The postorder representation of a tree is
( ( (5) (6) 2) (3) (4) 1)
The inorder representation of a binary tree is
( ( (5) (6) 2) (3) (4) 1)
The postorder representation of a binary tree is a bit awkward to be represented in this manner. As highlighted in the book, when we know additional properties of nodes (like number of arguments of an operator), then we can circumvent the usage of parenthesis in a preorder/postorder representation of any type of tree. Inorder representation must encode structural information of a binary tree, since ambiguities may arise.
Preorder/Postorder Degree Representation
A tree can also be represented by listing the degree (number of children) of each node in a specific traversal order. In a binary tree, standard "degrees" don't apply the same way. Instead, the degree sequence of a tree dictates the shape of the right-leaning spines in the matching binary tree. More specifically,
If a node X in a tree has a degree of k, it means X has k children.
In a binary tree, this translates to: X's left child exists, and from that left child, there is a continuous chain of right children of length k−1. For example, node 1 of a tree in our diagram has 3 children, thus node 2 of a binary tree has a right-leaning spine of length 2.
A degree of 0 in a tree means the corresponding node in a binary tree has no left child.
Exercise 6.9
Figure 6.6 and this exercise are broken. An N-step gambler’s ruin path cannot encode a correspondent binary tree with N−1 internal nodes, hence cannot represent a forest with N−1 nodes. It would need N−1+N=2N−1 steps. In principle, the approach is documented in the post about connection between forests and trees on StackExchange for Mathematics. Figure 6.6 has lattices of wrong dimensions (they aren’t square lattices as the book claims) and their paths don’t reflect the trees in Figure 6.1.
Exercise 6.10 🌟
This exercise introduces André's "Reflection Principle", a very important technique to reason about the basic ballot problem. The cited article also debunks the myth around this method.
Let’s rotate any lattice-path from Figure 6.6 by 45° counterclockwise to get a graph of the winner’s margin as the ballots are counted. This perspective is more natural for understanding the dynamics behind the scene. We can represent ups and downs as follows:
Reading a 0 is a step "Up-Right" (+1,+1).
Reading a 1 is a step "Down-Right" (+1,−1).
Every path starts at A(0,0) and lands at T(N,2k−N), assuming k zeros in a path. For being a good path, it must never drop below the x axis, i.e., touch the y=−1 line. Consequently, the necessary condition is 2k−N≥0⟹k≥⌈N/2⌉. In probability, it’s sometimes easier to count the complementary events. In our case, we have that the number of good paths is equal to the difference between total number of paths and bad ones. The former is easy to express as (kN). For figuring out the number of bad paths, we apply the previously mentioned reflection method.
If a path is bad, it hits the line y=−1 at some point P(M,−1). If we take the portion of the path before that first touch point and reflect it across the line y=−1, the path will now appear as if it had one more zeros then ones. Recall that a reflected path climbs up from A′(0,−2) to P and does the opposite of what A(0,0) to P does. The key insight with the reflection method is that we can establish a bijection between bad paths from A(0,0) to T via P and paths fromA′(0,−2) to T via P. Because for every possible way for a path to hit P from A, we have a unique path to do the same from A′. Notice also that segments after P overlap between these paths. Therefore, the number of bad paths from A is equal to the number of all paths from A′. The latter is now easy to express as (k+1N).
To find the total number of valid strings of length N, we just add up these valid paths for all allowable values of k
The sum nicely telescopes and only the first term "survives" cancellations. Remember that (N+1N)=0, hence the last term doesn’t matter.
Exercise 6.11
There is a bijection between the parenthesis representation of an ordered forest to the plus-minus representation of its associated binary tree. We should swap symbols: ( + and ) −. There is only one caveat, that there is an extra - at the end of the representation of a binary tree to encode the last external node. This difference should be taken into account when converting from one form into another.
Exercise 6.12
This topic is known as treemapping and many algorithms exist that produce outputs of various quality. Aspect ratio plays a central role and a particularly popular variant is the squarified treemap. It attains α close to one.
As a side note, leveraging the ISO paper standard’s constant width-to-height aspect ratio of α=2, we can develop a simple algorithm to represent binary trees with nested rectangles. This value of α ensures that any resulting smaller sub-rectangles will retain the exact same aspect ratio. Now, the algorithm would traverse in preorder the binary tree and for every internal node split the corresponding nested rectangle along its wider side into two; here, lines denote internal nodes. An extension to store ordered trees would be to first convert them into binary trees using rotation correspondence.
Exercise 6.13
The paper Rotational tree structures on binary trees and triangulations explains in detail the rotational relationships among the N trees corresponding to the N rotations of an asymmetric triangulation. An advantage is that it employs the identical hexagons illustrated in Figure 6.8 of the book.
Exercise 6.14
The text of this exercise is flawed. It’s essential to restrict consideration to positive integers. Allowing negative numbers at the boundaries could result in an infinite number of valid sequences. Second, if we rely solely on those two explicit rules, assuming positive numbers, things already fall apart for N=2. Namely, the allowable sequences are (1,2), (2,1) and (1,1), but T2=2<3. If we should treat the constraint "some larger integer must appear somewhere between any two occurrences of any integer" to apply only for duplicates and force between them some larger number, then (1,1) clearly disappears. Nonetheless, things then go wrong for N=3, since all 6 permutations of 1,2,3 would work including (1,2,1). Again, T3=5<7.
I guess, the authors wanted to introduce in this exercise the Cartesian tree, tweaked to focus on maximums instead of minimums. Reconstruction of a sequence from a binary tree would happen by in-order traversal of the tree using a node's height as its value.
Exercise 6.15
Let ti denote the ith child of t, if any. Recall that a general tree cannot be empty. The recursive definitions are as follows: pl(t)=∣t∣−1+∑ipl(ti)h(t)=[∣t∣>1]+maxih(ti).
Notice the usage of the Iverson bracket in the definition of height.
Exercise 6.16
Let l(t) denote the number of leaves in t. For a binary tree, if t is an external node, then l(t)=0. The number of leaves in binary and general trees, respectively, are as follows: l(t)=[∣t∣=1]+l(tl)+l(tr)l(t)=[∣t∣=1]+∑il(ti).
See the previous exercise for notational conventions.
Exercise 6.17
We prove this by induction on N (number of external nodes) using a bit stronger hypothesis h(t)≥⌈lgN⌉≥lgN, because the height is a positive integer. The base case (N=1) is trivially satisfied, since h(t)=0≥lg1=0.
For the inductive step N>1, we have
The derivation relies on the fact that N=Nl+Nr, since the root is an internal node. Furthermore, one of the subtrees must have at least half of the nodes, while the other one must have at least one external node. Therefore, we can apply the inductive hypothesis on a larger subtree.
Exercise 6.18
The Kraft–McMillan inequality is a generalization of this exercise. The referenced Wikipedia article provides a comprehensive proof, referring to leaves rather than external nodes. The stated equality applies to binary trees—defined, as in the book, as full binary trees—within the context of complete codes. See also Exercise 6.37 for more in depth details.
Exercise 6.19
The bounds are N−1≤pl(t)≤N(N−1)/2. The lower bound is achieved by a star tree (root with N−1 children), whilst the upper bound applies for a chain of nodes.
Exercise 6.20
Assume N>0. By the Lemma from the book, we have xpl(t)=ipl(t)+2N. Therefore, bounds on the internal path length determines those on the external. From the previous exercise, we already know that
The lower bound is attained by the most balanced tree (complete binary tree), where height is minimized. Consequently, h(t)=⌈lg(N+1)⌉ with 1≤r≤2h(t)−1 leaves at the last level. This gives
Therefore, xpl(t)=NlgN+O(N).
Exercise 6.21
Assume that the number of internal nodes is N>0. The number of leaves satisfies
The lower bound happens for the most unbalanced situation (chain of nodes). The upper bound is attained by the most balanced tree (complete binary tree). In a perfect binary tree (where all leaves are at the last level), their number is exactly (N+1)/2=⌈N/2⌉. Any other arrangement results in a lower number, since we need to remove two leaves at the last level, to get an additional one on the level above.
Exercise 6.22
We can accomplish this with 2 registers as follows:
Observe that this is an optimal number, since we cannot do it with only a single register.
Exercise 6.23

The traversals are enrolled below from left to right:
preorder:
*+abdinorder:(a+b)*dpostorder:ab+d*preorder:
-**+ab-de+fg*hiinorder:(((a+b)*(d-e))*(f+g))-(h*i)postorder:ab+de-*fg+*hi*-
Exercise 6.24
The unadorned traversals of the tree are:
preorder:
-*+*aabc-*dddd*ee+fgh*ijpostorder:
aa*bc+dddd*ee*-fgh+*ij*-
These only work unambiguously if every operator has a fixed and known arity. For example, *aabc could be interpreted as a2 or a2∗b. Arity indicators may resolve these ambiguities:
preorder:
-2*3+3*2aabc-2*4dddd*2ee+3fgh*2ijpostorder:
aa*2bc+3dddd*4ee*2-2fgh+3*3ij*2-2
The unadorned traversals of the binary tree are:
preorder (same as the preorder for the tree):
-*+*aabc-*dddd*ee+fgh*ijinorder (same as the postorder for the tree):
aa*bc+dddd*ee*-fgh+*ij*-postorder:
aacb*ddddee**hgf+-+ji**-
See Exercise 8 for details about how degrees (arity) can be deciphered based on the length of right leaning chains. Observe that the postorder traversal here produces an algebraic mess.
Exercise 6.25
The sequence A138156 enumerates the sum of internal path lengths of all binary trees with N edges (N−1 internal nodes). So, Q5=352/42≈8.381.
The sequence A288964 enumerates the number of key comparisons to sort all N! permutations of N elements by quicksort. So, C5=888/120=7.4.
Explanation for the Correspondence with Quicksort
The total number of key comparisons made by quicksort (using the first element as pivot) when applied to all N! permutations of N distinct elements equals the sum of the internal path lengths of the binary search trees obtained by inserting the elements in the same order. This implicit connection is mentioned in the book (see section 6.8 and Exercise 6.34) as well as in the book's course material (see slide 43).
Consider a fixed permutation. In quicksort, each recursive call picks a pivot, let it be the first element of the current subarray just to make the connection with BST clearer, and compares it with every other element in that subarray. Thus, if a pivot corresponds to a node v whose subtree in the recursion tree (or in the corresponding BST) contains ∣tv∣ elements, then the number of comparisons made at that node is ∣tv∣−1. Summing over all nodes gives the total comparisons for that permutation
Now, in the BST built by inserting the elements in the same order, the root is the first element, and the left and right subtrees are built from the elements that are smaller and larger, respectively, in the order they appear. This exactly mirrors the recursion of quicksort. For any tree, the sum of subtree sizes (including each node itself) is related to the internal path length by
because each node contributes 1 to its own subtree and to the subtree of every ancestor. Combining the two equations yields c=(ipl(t)+n)−n=ipl(t). Thus, the number of comparisons for a given permutation equals the internal path length of its associated BST. Summing over all permutations gives the desired equality.
Exercise 6.26
A degenerate tree structure is a chain. Observe that a preorder traversal of such a tree reproduces the original sequence of insertions, because each insertion simply adds the new element at the end of the chain.
Let ai=aj be two elements whose relative order differs in two different permutations that produce a chain. There must be such a pair. WLOG let i<j, otherwise swap indices. The two BSTs cannot be structurally the same, since the directionality of branching at ai must change. Recall that i<j entails that ai was inserted before aj.
The height of a degenerate tree is h=N. The total number of degenerate trees is 2h−1=2N−1, since we have two branching choices per edge and the number of edges is N−1. Therefore, the probability that a degenerate tree structure will result is 2N−1/N!.
Exercise 6.27
The sequence A076615 enumerates the number of permutations of {1,2,…,N} that result in a binary search tree with the minimum possible height. So, we just need to read out the corresponding entry and divide it by N!.
To derive the number T(n) of insertion sequences that yield a perfectly balanced binary search tree of height n (with N=2n−1 nodes), we use a recursive combinatorial argument based on the tree structure.
Consider a perfectly balanced tree of the given height:
The root is the median element (unique because keys are distinct).
The left subtree is itself a perfectly balanced tree of height n−1 with 2n−1−1 nodes.
The right subtree is identical in size and shape.
For any insertion sequence to produce this tree, the root must come first. The remaining N−1=2n−2 nodes consist of all nodes from the left and right subtrees. The order in which they’re inserted must respect the internal constraints of each subtree: the nodes of the left subtree must appear in an order that yields the left subtree shape, and similarly for the right. However, the two subsequences can be interleaved arbitrarily.
Let M=2n−1−1 be the size of each subtree, so:
The number of valid sequences for the left subtree is T(n−1).
The number for the right subtree is also T(n−1).
Once we have fixed the internal orders for each subtree, we need to interleave them into a single sequence of length 2M (the positions after the root). The number of ways to interleave two sequences of lengths M and M while preserving each internal order is (M2M). This counts the choices of which positions in the combined sequence are occupied by left subtree nodes; the left-subtree sequence then fills those positions in order, and the right subtree sequence fills the rest.
This gives
with T(1)=1. The RHS follows by iterating the LHS. The probability of getting a perfectly balanced tree is T(n)/N!.
As a quick check, we get that T(3)=80, which matches the previously mentioned sequence for N=7 (it’s zero indexed).
Exercise 6.28
Preorder traversal naturally preserves the tree structure, since roots of subtrees are visited first. This ensures that the relative order between ancestors and their descendants are properly maintained, since parents are already in place before their children are inserted. By induction, the tree is reconstructed correctly.
The postorder traversal of the tree below is 1 2. But this sequence would produce a right leaning tree, which is different than the original. Thus, postorder traversal doesn’t preserve the tree structure.

The level order traversal also preserves the structure, since nodes are visited in a manner that retains the relative order between ancestors (nodes visited/inserted earlier) and their descendants. Again, parents are already in place before their children are inserted. By induction, the tree is reconstructed correctly.
Exercise 6.29
Exercise 6.30
Exercise 6.31
We can derive the relationship by monitoring how node levels translate between the two tree structures and using the inherent symmetry of binary trees.
A general tree G of N nodes maps to a binary tree T of N nodes where the root only has a left subtree of size N−1. A key insight is that the level of a node v in G, denoted as levelG(v), matches the number of left branches on the path from the root to v in T, denoted as left-branchesT(v). This follows from the first-child, next-sibling rule of rotation correspondence. Thus, we can express the path length of G in terms of left branches in T
The number of left branches of any node in T is one more than this number taken in the root's left subtree tl. This gives
Now we look at the average over all randomly generated trees. Let E[pl(GN)] be the expected path length of a random general tree of size N, and E[ipl(TN)] be the expected internal path length of a random binary tree of size N. On average, we expect that the total number of left branches across all paths in T equals the total number of right branches. Since the internal path length of a binary tree is the sum of left and right branches, as specified above, we have
This gives
This result matches the formula from the book, once we substitute the expression from Theorem 6.3 into the RHS.
Exercise 6.32
In Program 1.2, the pivot selected for each subarray is its final element. Accordingly, it is necessary to ensure that the last element of each subarray becomes the root of the subsequent subtree. It should be noted that during each iteration, the pivot is exchanged with the border element, which must be considered in the overall process. Possible permutations corresponding to Figure 6.12, listed from left to right, are as follows:
AA,AB,CN,MS,EF,MC,JB,PD,ALPD,MS,MC,JB,EF,CN,AL,AB,AAAA,AL,AB,EF,CN,MS,MC,PD,JB
Exercise 6.33 🌟
This exercise showcases some important technical details pertaining to search probabilities.
The probability that the successful search cost is 2 equals the probability that the key matches the root of the left or right subtree of the tree's root. Due to symmetry, these two probabilities are the same. Observe that subtrees of random BSTs are themselves random BSTs, so the randomness property is fully preserved (like, in Quicksort). Let Sk be a subtree of size k. The joint probability can be expressed as
We ignore k=0, since only successful searches are considered. The final probability is computed by summing over 0<k<N and multiplying by two
An important aspect of the preceding analysis merits attention. It’s established that binary search trees constructed from random permutations generally exhibit balanced structures. The reason top subtree sizes adhere to a uniform probability distribution is that the root node may assume any value within the interval [1,N] with equal likelihood, thereby determining the respective sizes of its left and right subtrees.
Exercise 6.34
The following Python script creates the requested plot.

Exercise 6.35
The following Python script creates the requested plot.

Exercise 6.36
By iterating the recurrence for pN(u), we can get the following closed-form solution:
Let’s take the natural logarithm of both sides to make differentiation much easier
The first derivative is
The average is pN′(1)=2(HN+1−1). The second derivate is
Evaluating it at u=1 gives
Recall that pN(1)=1, since pN(u) represents a probability distribution. Substituting all components back into the formula for variance, we get the equation from the book.
Exercise 6.37
We reuse many ideas from Exercise 6.33. Searching for a randomly chosen key in a tree of size N encompasses the following cases:
We hit the root with probability N1.
Because the left subtree of size k is uniformly distributed, and the probability that the target key happens to be in that specific left subtree is Nk, the search costs 1 (for the root) plus the successful search cost within a tree of size k.
A successful search in a singleton tree entails s1(u)=u.
This gives
Differentiating both sides using the product rule gives
Evaluated at u=1, we get the average (see Exercise 2.14)
The second derivative is
Evaluated at u=1, we get (see Exercise 2.14 and Exercise 3.8)
After substituting all components into the formula for the variance, we get
Exercise 6.38 🌟
This exercise builds on the Kraft equality for binary trees to establish the connection between PGFs for successful and unsuccessful searches in a BST. Our direct method uses the structural relationship between internal nodes (successful searches) and external nodes (unsuccessful searches) that holds true for any binary tree.
Let sN(u) be the PGF for successful search and pN(u) for unsuccessful search (see the previous exercise). We define these PGFs as follows:
A successful search costs l(v)+1 (where l(v) is the internal node's level), so sN(u)=N1∑v ∈ Internalul(v)+1.
An unsuccessful search costs l(v) (where l(v) is the external node's level), so pN(u)=N+11∑v ∈ Externalul(v).
Establishing the Relationship between PGFs
Let PN(u)=(N+1)pN(u) and SN(u)=NsN(u)/u. We aim to find a way to express these two quantities, by inspecting how any binary tree is built from scratch. In an empty tree (N=0), we have that PN(u)=1, a tree with only one external node. Following the Nth insertion, an external node at level l was substituted with a new node, resulting in the emergence of two additional external nodes. Therefore,
We can conclude, and prove by induction using the above statements, that the following identity holds
If we plug u=1/2 into ∑v ∈ Externalul(v)=1+(2u−1)∑v ∈ Internalul(v), we get 1, which is the Kraft equality that describes the profile of a binary tree (see Exercise 18).
Computing Moments
Now, we follow the standard approach for calculating moments. The first derivative is
Evaluating at u=1 gives the average cost of a successful search
Recall that pN(1)=sN(1)=1 and we already know pN′(1) from Theorem 6.6. The second derivative is
Evaluating at u=1 gives
The variance of a successful search is σS2=sN′′(1)+sN′(1)−sN′(1)2. Thus,
Exercise 6.39 🌟
This exercise demonstrates Theorem 6.7 in "inverted" regime, when we’re seeking a coefficient associated with a local property (like, the average degree of the root of a general tree).
At the time of writing, the book's website indicates that solving this exercise assumes familiarity with the Lagrange inversion theorem; this note is also relevant for several subsequent exercises. It appears that this topic was originally included in Chapter 3 before being reassigned to Section 6.12. However, the exercises themselves were not relocated accordingly.
Theorem 6.7 is built to relate a global, additive parameter c(t) to a local toll function e(t).
EG(z) is the CGF for the local toll e(t). In our case, it’s the degree of the root of a tree.
CG(z) is the CGF for the global cost c(t). The global cost is the sum of the local tolls for every single node in the entire tree.
To find the average of just that local toll across all trees of size N, we must extract the coefficient from EG(z) and divide it by the total number of trees:
This is different from what the Corollary says in the book. Our local toll function e(t) is simply the degree of the root of t. The recurrence relation of c(t) entails that we’re summing the root degree of the main tree, plus the root degrees of its subtrees, and so on. This turns out to be the total number of edges in a tree, hence c(t)=∣t∣−1. Now, we can work backwards, by first finding the CG(z). We have
By using the identity from Theorem 6.7, we get
We can use the substitutions z=G(z)(1−G(z)) and 1−4z=1−2G(z) to define everything in terms of G(z) (see also Exercise 41)
The total sum of root degrees for all trees of size N is
Using the Lagrange inversion theorem for Catalan trees ([zN]G(z)k=2N−kk(N2N−k)), we have
After dividing by GN=TN−1, we get that the average number of children of the root in a random Catalan tree of N nodes is 3(N−1)/(N+1). This perfectly matches the examples in Figure 6.3 from the book.
Exercise 6.40
The toll CGF follows from Exercise 4, hence
Applying Theorem 6.7 gives
Based on the Corollary of Theorem 6.7, the average number of 1-child nodes per tree is
From Figure 6.3 in the book, the answer is 10/7 for N=5. This perfectly matches our formula. To get the proportion of nodes, we need to divide this amount by N. Therefore, the answer is (N−1)/(4N−6).
Exercise 6.41
The toll CGF follows from Exercise 5, hence EG(z)=zG(z)k. Applying Theorem 6.7 gives
Extracting coefficients from this GF is quite a mess. Instead, we’ll apply the extended Lagrange inversion theorem (see also the remark in Exercise 39). For Catalan trees, the standard functional equation is
Consequently, z=G(z)(1−G(z)) and Theorem 6.7 simplifies to
Let’s express the numerator in terms of g(u), so the coefficient extraction is
This gives
The exact proportion of nodes with k children is (see also the previous exercise)
Exercise 6.42
We already know from the book, that the average number of leaves in a random binary Catalan tree of N nodes is N(N+1)/(2(2N−1)). Hence, the fraction of leaves is (N+1)/(2(2N−1)).
To compute the fraction of internal nodes with a single child, we apply Theorem 6.7 with the toll CGF ET(z)=2z(T(z)−1) (see also Exercise 40). In this instance, it’s necessary to omit the empty subtree, while permitting either the left or right branch to contain nodes. This gives
The fraction of nodes of this type is
To compute the fraction of internal nodes with two children, we can apply Theorem 6.7 with the toll CGF ET(z)=z(T(z)−1)2. But there is a simpler way. Namely, the sum of fractions must be 1, since we cover all cases. Therefore, the fraction of nodes with two children is
Exercise 6.43
We already know from the book, that the average number of leaves in a random BST of N nodes is (N+1)/3. Hence, the fraction of leaves is (N+1)/(3N).
To compute the fraction of internal nodes with a single child, we apply Theorem 6.7 with eN=2/N for N≥2. This follows from the fact that the root of a subtree of size N must be the smallest or largest element in order to have a single child. Thus,
This gives the same fraction as for leaves
The fraction of nodes with two children is
Exercise 6.44
For all subtasks, the mechanics of calculating the variance is based on Table 3.5. We omit the technical details and only present the derivation of BGFs with final results for variances.
The BGF for the number of leaves in a random binary Catalan tree is given in Section 5.4 of the book
Therefore,
The symbolic method may help a lot in finding the BGF for the number of leaves in a random general Catalan tree. A tree is either a leaf or a root attached to a non-empty forest. Therefore,
We should mark the leaf, so this translates to
The variance is
Setting up the BGF in case of a random BST proceeds along the lines of the computation of moments presented in Section 3.10 (see the Quicksort distribution subsection) of the book. For the latter, consult also the derivation shown in Exercise 3.67. Let QN(u) be the PGF for the number of leaves in a random BST of size N. The base cases are:
An empty tree has no leaves, so Q0(u)=0.
A singleton tree has one leaf, so Q1(u)=u.
Subtree sizes are uniformly distributed with probability 1/N, as emphasized in the book, hence (see also Exercise 3.62)
As in the proof of Theorem 6.3, to simplify this recurrence, we move to an "enumerative" approach
Let our exponential BGF be Q(z,u)=∑N≥0QN(u)zN. If we multiply our recurrence by zN−1 and sum on N≥1, the LHS transforms into
The RHS becomes a self-convolution (after making an adjustment for N=1 to balance it with the LHS)
We arrive at the following ODE
with initial condition Q(0,u)=1. From here, we virtually follow the same steps as in Exercise 3.67. The variance is
Exercise 6.45
We assume that e(t)=f(∣t∣)=eN, which is usually the case with almost all toll functions. Let's define a toll BGF
The BGF version of Theorem 6.7 uses the Hadamard product of GFs
Proof:
Let T be the set of all binary Catalan trees. We have
Let G be the set of all general Catalan trees. We have
For BSTs we start with the PGF
and continue as in the previous exercise. We don’t need anymore that correction factor for N=1, since the toll function eN handles all cases. Finally, instead of multiplying by zN−1 we must multiply by zN to be able to smoothly apply the toll BGF. This produces that extra z factor on both sides.
Exercise 6.46
We start by deriving the OGF for Fibonacci polynomials
Now, we expand this back into a series, so that we can extract the coefficient Fh+1(z).
Exercise 6.47
First, we need to setup the equation for applying the Lagrange inversion theorem (see the third case as well as the remark in Exercise 39)
The last step follows from identities stated in the book (relating u, z and G(z)), so that we can conclude
We want to extract the coefficient [zN+1] from the above summand. The Lagrange inversion theorem says that if z=f(u), then
The first derivate of g is
Substituting it back into the theorem's formula, we get
The last step (the formula from the book) is a result of intense algebraic gymnastics behind scene; the absorption identity is helpful for grouping terms. Using a computer algebra system it’s easy to show that the last two lines are equal. All in all, putting it back into the sum over k produces the equation of Theorem 6.9.
The background about that last line, as published in the book, is available in the presentation Analyses of Tree Height (slides 6-7). The whole expression is nicely captured in terms of the second-order central difference.
Exercise 6.48
We need to rigorously justify the transition from the discrete binomial sum to the continuous H(x) integral, specifically by managing the error bounds using Theorem 4.9 and Euler-Maclaurin summation. Of course, the proof sketch provided in the book still remains the main scaffolding, so we fill the gaps here.
To keep things aligned with Theorem 4.9, let’s define the central region as 1≤h<2N1+ϵ for a small constant ϵ>0. Notice that it strictly covers the book's range 1≤h≤NlogN in the proof of the corollary of Theorem 6.9.
As pointed out in the previous exercise, the sum denoting GN−GN[h−1] is fundamentally adding up discrete curvatures of binomials at various distances (using step size δ=kh). Inside the central region, according to Theorem 4.6, we can efficiently employ the normal approximation of the binomial distribution. Thus, If a binomial coefficient acts like a Gaussian curve, then its discrete second difference (our term in the sum) must act like the continuous second derivative of that Gaussian curve. So,
The 1/N factor gets "annihilated" due to division by GN when moving into the realm of Catalan sums. Notice that H(h/N) encompasses the expression we just derived.
Because we are swapping discrete binomials with the smooth function H(x), we introduce a relative error. Specifically, Theorem 4.9 tells us the approximation is valid up to a factor of
Let's bound the total error introduced across this entire central region
We know from the book, that H(x) is bounded and decays (see also the next exercise), hence we can approximate the sum ∑H(h/N) by its integral, which scales as O(N) (proven below). Picking ϵ<1/4 the error turns out to be o(1).
Outside of the central region, according to Theorem 4.9, the tail is bounded as O(e(−2N)ϵ). This is exponentially small, meaning the tail sum is entirely negligible. We have now rigorously reduced the expected height to
To transition to the integral, we apply the Euler-Maclaurin formula for a function f(h)=H(h/N). We can apply the second form of Theorem 4.3 over the interval [1,M] and let M→∞. The usage of M prevents overlap with N (number of nodes).
Because H(x) and all its derivatives decay to zero as x→∞, we can safely take the limit as M→∞. This gives
To evaluate the integral, we temporarily reuse the result from the book, so
We must explicitly evaluate the integral ∫0∞H(x)dx. The trap is that we cannot simply swap the sum and the integral because the sum is not absolutely integrable over (0,∞). A term-by-term integration produces zero for every k. We should use the Theta function θ1(x)=∑k=1∞e−k2x2. We have
The constant associated with the function is
Consequently, the average height is indeed πN+O(1).
Another Combinatorial Proof
Instead of directly handling the error terms in the original approach, it’s instructive to read the paper The Average Height of Catalan Trees by Counting Lattice Paths that provides a combinatorial proof better suited for computer scientists. It builds upon the material from the book and cites results from the authors of the book.
As a side effect, it also achieves the intended goal of Exercise 9 by establishing a bijection between Dyck paths (related to gambler’s ruin paths) and general Catalan trees.
Exercise 6.49

Exercise 6.50
Because of the rotation correspondence, it turns out that the stack height of binary Catalan trees is essentially distributed like the height of forests. The height of a forest is the maximum height of its constituent trees. The book already establishes the relationship between the stack height of binary and general Catalan trees.
Note, the relationship is not a point-wise equality. For example, a left‑chain binary tree with N nodes has s(t)=0 but the corresponding forest height is N−1. Nevertheless, the two quantities are asymptotically equivalent in distribution, as highlighted above.
The stack size, as specified, is related to the Strahler number, which measures branching complexity.
Exercise 6.51
We have z=f(u)=u/(1−u), thus the first case applies
Exercise 6.52
Let f(z)=ez−1, so f(f−1(z))=ef−1(z)−1=z. This implies that f−1(z)=ln(z+1). Its Taylor expansion is
Let’s apply the Lagrange inversion theorem
Handling the RHS is really messy (see Exercise 3.36), as we need to raise a convoluted power series to nth power. Nonetheless, we can put the theorem into a "reverse" regime and get a stunning combinatorial identity
This is akin to factoring GFs to derive new identities (see Exercise 3.5). Compare also this result with the one from the previous exercise.
Exercise 6.53
See Exercise 72 for the generalization of this problem.
Exercise 6.54
See Theorem 6.15 in the book with t=4.
Exercise 6.55
This is known as Cayley’s formula.
Exercise 6.56
The book Introduction to Algorithms Fourth Edition, by CLRS (see Section B.5.1 Free trees) contains a full proof using an extended list (6 instead of 4) of equivalent properties. Furthermore, they apply a clever trick to reduce the combinatorial explosion in the proof. Namely, it’s enough to prove just one cycle between these properties, that one implies another in a cycle, reducing the required work.
Exercise 6.57
The post on StackExchange for Mathematics lists all different free trees with 6 nodes. The distribution of ordered trees onto these in left-to-right order is 2, 10, 10, 5, 10, 5. Figure 6.22 shows that the winner is the tree structure known as the Broom graph B5,3. The same type of graph structure B6,3 also appears most frequently among all ordered trees on six nodes (it’s the second one from left in the linked post), although not uniquely. Nonetheless, for large N Broom graphs actually perform poorly (see the next exercise).
At any rate, even for the cases with 5 and 6 nodes, highly symmetric trees perform badly in this respect. For example, the star graph is always associated with only 2 ordered trees.
Exercise 6.58
Imagine a central node connected to
a single leaf,
a chain of 2 nodes,
and a chain of 3 nodes.
This free tree has N=7 nodes in total. It has maximum asymmetry achievable with a tree of this size. If we promote the central node as a root, we immediately produce 3! different ordered trees. If this central node becomes a child of some other root, then it again contributes 2! variants in any context. A Broom graph is no match for this version.
The key takeaway is to maximize asymmetry within the structure. An effective approach involves beginning with a central node accompanied by subtrees that are structurally distinct, applying this principle recursively throughout each subtree. It’s essential to balance this objective against the number of available nodes. While positioning a high-degree node as the root can generate numerous ordered trees, it is also important to avoid identical subtrees, as these diminish the overall count.
Exercise 6.59
Assume tree denotes an ordered tree, based on the following quote from the book:
But we maintain our algorithmic bias, by reserving the word tree for ordered trees, which arise in perhaps the most natural way in computer applications.
The description below is a linear time algorithm, as a function of number of edges.
Building a Tree Representation
Read an input consisting of a set of edges and produce an adjacency list representation of the tree. Find the root (the node with in-degree 0 assuming edges are directed). Because an adjacency list has no strict left-to-right order, the "ordered" nature of the resulting tree will depend on the order edges are read. The list data structure preserves the order of how items were added to it.
Exercise 6.60
After executing the algorithm from the previous exercise, we would immediatelly get the matching binary representation of the tree. Recall from Exercise 8 that a preorder traversal of an ordered tree aligns perfectly with the same traversal of the matching binary tree due to rotation correspondence.
Exercise 6.61
For binary trees there are various streamlined approaches. The LeetCode problem Flip Equivalent BInary Trees (the editorial is publicly available) provides an environment where this algorithm can be implemented and tested. Below is the solution, which illustrates the idea behind the generic AHU-algorithm. It differs from the third approach of LeetCode’s official solution by preserving the original trees.
The Python3 code is based on the pseudo-code of the AHU-algorithm, as given in the cited paper. To keep things simple, the step to condense labels is skipped, which is important to retain linear time performance. Finally, in this LeetCode problem nodes already have distinct values. In general, you would need to assign IDs based on unique subtree structures (see the cited paper).
Exercise 6.62
See the previous exercise about the AHU-algorithm. The cited paper there is all about explaining in detail the algorithm using pseduo-code and it also provides proofs of correctness.
Exercise 6.63
The book suggests building the series step-by-step by iterating the functional equation
with U[0](z)=z. Because both series agree to h+1 terms, we can just treat everything as bounded polynomials and update them layer by layer. The next Python program implements the naive version without using any external libraries. Its running time is O(N4) arithmetic operations. It’s intentionally kept in this form to truly depict what is going on under the hood. A simple optimization, elaborated at the beginning of the next exercise, may bring down the running time to O(N3) even for this basic approach.
Python 3 by default handles arbitrarily large integers, so any number of coefficients may be printed. Of course, the bad performance characteristics of this version precludes large N.
There are lots of relatively novel implementations in various programming languages to compute UN. These are documented in OEIS A000081. Among them there is a one liner program in SageMath.
Exercise 6.64
The underlying idea of this optimized version is best understood by looking at a more mundane task of speeding up the poly_exp function (see the previous exercise). It evaluates the Taylor expansion of ex by computing each term from scratch. Overall this entails O(N3) operations. Let’s see how to reduce this to O(N2). Let E(z)=exp(P(z)). Taking the derivative of both sides, we get a differential equation
Equating the coefficients of zN−1 gives
Thus, computing each coefficient demands only O(N) instead of O(N2) arithmetic operations. The Harary-Palmer method differentiates the entire functional equation to get everything down to O(N2). This gives
We want to extract the coefficient of zN+1. On the RHS we must have i+kl=N+1. Because i≥1 this implies that kl≤N. Furthermore, l≥1 means that 1≤k≤N. We get
The final equation is the form used in the efficient Python implementation shown below. It’s also the direct application of the Euler transform (see the third case) on U(z). The boxed formula is the one we need to show as part of this exercise. Using memoization we can achieve the O(N2) objective. Notice, that handling the inner divisor sum requires O(NlogN) operations during the whole run.
At the time of this writing, the original Python program published at OEIS A000081 contains a bug. Namely, it needlessly recomputes the inner divisor sum. The code above fixes this issue.
Exercise 6.65
Exercise 6.66
Each free tree gives at least one rooted tree (by picking any node as root), so there are at least as many rooted trees of N nodes as free trees of N nodes. There are N choices for picking a root, so there can be at most N times as many rooted trees of N nodes as free trees of N nodes.
Exercise 6.67
The derivation is based on a combinatorial argument expressed in terms of OGFs. The starting point is Otter’s theorem "In any tree the number of nonsimilar vertices minus the number of nonsimilar lines (symmetry line excepted) is the number one." Therefore, the formula for any free tree is
If we sum this property over all free trees, the LHS transforms immediately into F(z). On the RHS, the first term is simply U(z), by definition of rooted trees.
Every undirected edge contains two rooted trees on its endpoints. The OGF for a sequence of 2 rooted trees is U(z)2. Nonetheless, this also includes the symmetric case, when both trees are the same. We must add this amount back. Furthermore, we don’t care about order, so this must be divided by two.
The symmetric case counts the number of even sized rooted trees, hence its OGF is U(z2). Taking into account the previous remarks, we end up with the equation
Exercise 6.68
We want to find
We know from the book that [zN]U(z)=UN∼cαNN−3/2. The symmetric term [zN]U(z2)∼UN/2 is exponentially small relative to UN (see Exercise 4.9). We are left with estimating the discrete convolution
Let’s partition the sum into three distinct regions using an intermediate boundary A (such as A=N), where 1≪A≪N:
The Left Tail: k∈[1,A]
The Right Tail: k∈[N−A,N−1]
The Central Region: k∈(A,N−A)
Due to symmetry, the left and right tails are the same.
The Left Tail and the Residual
In the left tail, k is small relative to N. We use the binomial expansion to approximate the (N−k)−3/2 term in UN−k
Substituting this into the left tail of the convolution (ignoring the last O-term)
To evaluate the first sum, we use the technique of bounding the tail from Section 4.4 of the book. For this, we must peek a little bit into singularity analysis in the book Analytic Combinatorics by PF & RS. The following lemma is important:
Lemma: Let ρ=1/α be the radius of convergence of U(z). Then, U(ρ)=1.
This gives
Thus, the left tail evaluates to
The last summand is the result of evaluating the second sum in the left tail by employing the Euler-Maclaurin method. Cf is the usual constant in this approach.
The Central Region
In the central region, both k and N−k are large, allowing us to safely replace both with their continuous asymptotic equivalents and approximate the sum as an integral
Combined Result of the Convolution
We now reassemble the total convolution by summing the three regions
Conclusion
Substituting back the result of the convolution, we have
We’ve applied the method of matched asymptotic expansions, which ensures that all artificial boundaries vanish from the final expression.
Exercise 6.69
The number of distinct labelings of an unlabeled tree T with N nodes is N!/∣Aut(T)∣, where Aut(T) is the automorphism group of T. The automorphism group Aut(T) of a tree T is the set of all permutations of the nodes of T that preserve the adjacency relation. In other words, a permutation σ of the node set is an automorphism if for any two nodes u,v, the pair {u,v} is an edge in T if and only if {σ(u),σ(v)} is an edge in T. To maximize the number of labelings, we minimize the size of the automorphism group.
For N=4,5,6, the path (automorphism group of order 2) has the most labelings (only a sole winner when N=4). For N≥7, we can always create an asymmetric tree with a trivial automorphism group producing N! labelings.
A concreteÍ example with N=7 nodes is
Notice that every configuration is unique. One possible general structure is to have a central node with branches of different lengths.
Exercise 6.70
Wikipedia has a detailed solution as part of the Cayley’s formula.
Exercise 6.71
The equation follows from the arguments given in Exercise 67 with the symmetrical case omitted, as it doesn’t apply here.
Exercise 6.72
The OGF for k-forests is just T[t](z)k, where T[t](z) is the OGF for t-ary trees. We know from the book that z=u(1−ut−1). By Lagrange inversion (using the second case in Theorem 6.11), the number of k-forests of t-ary trees with a total of N internal nodes and (t−1)N+k external nodes (each tree also contributes +1 to the total) is
Exercise 6.73
We start with the exact formula from the book
For large N, we can approximate the prefactor as
Now, we apply Stirling's approximation to the three factorials in the binomial coefficient
This gives
Exercise 6.74
We start with the formula for the number of t-restricted trees from the book
We want to extract the coefficient [uN−1], so only terms satisfying the following equations are relevant
This gives
Exercise 6.75
The Python 3 program below implements in code the formula derived in the previous exercise.
The default configuration prints the Motzkin numbers on the output. They match the OEIS A001006 sequence.
Exercise 6.76
The customized Theorem 6.16 uses the next θ function
The RHS follows by reusing the idea from Exercise 74. Because the sum of children counts (i.e. the total number of edges) is even, the total number of nodes N=edges+1=2k+1 must be odd. The Lagrange inversion formula gives
Let’s use the substitution v=u2, so
We want to extract the coefficient [u2k]=[vk], so only terms satisfying the following equations are relevant
This gives
Exercise 6.77
Balanced trees (AVL and 2-3) have definitely a smaller cardinality than unbalanced trees (3-ary and 3-restricted).
AVL has only a local height constraint whilst 2-3 has a global condition (all external nodes must be at the same height). Nonetheless, AVL is a binary tree, which turns out to be a more limiting factor.
The 3-ary trees have more possibilities, since "empty" links do contribute to the combinatorial count.
Therefore the list of trees in increasing order of their cardinality for large N: AVL, 2-3, 3-restricted and 3-ary.
Exercise 6.78
Assume N denotes the number of internal nodes.
Both trees show oscillations, especially the 2-3 variant. Some cases aren’t even possible. This shows how global constraints (2-3 depth) versus local constraints (AVL balance) completely rewrite the asymptotic behavior of the generating functions.
Exercise 6.79
We can apply the symbolic method in both cases. Notice the usage of the Iverson bracket operator.
An AVL tree of height h is either empty or has the root attached to perfectly balanced subtrees of height h−1 or attached to subtrees with heights h−1 and h−2 (either left or right child is taller).
To specify the functional equation on the generating function of 2-3 trees, we must use the substitution operator of the symbolic method. Based on the definition from the book, each internal node is either a 2 node (doubles the number of external nodes) or 3 node (triples the number of external nodes). Furthermore, all external nodes must be at the same level. Consequently, the total number of external nodes equals the product of multiplicative factors of internal nodes.
Notice that substituting atoms of Bh−1 with Z2 or Z3 exactly incorporates those possible multplicative factors of internal nodes.
Last updated