Counting binary trees

https://cs.stackexchange.com/questions/368

16-10-2019
|

Question

(I'm a student with some mathematical background and I'd like to know how to count the number of a specific kind of binary trees.)

Looking at Wikipedia page for Binary Trees, I've noticed this assertion that the number of rooted binary trees of size $n$ would be this Catalan Number: $$C_n = \dfrac{1}{n+1}{2n \choose n}$$

But I don't understand how I could come up with such a result by myself? Is there a method to find this result?

Now, what if the order of sub-trees (which is left, which is right) is not considered? For example, from my point of view, I consider that these two trees are the same:

   /\   /\
  /\     /\

Would it be possible to apply a similar method to count how many of these objects have exactly $n$ nodes?

Solution

For counting many types of combinatorial objects, like trees in this case, there are powerful mathematical tools (the symbolic method) that allow you to mechnically derive such counts from a description how the combinatorial objects are constructed. This involves generating functions.

An excellent reference is Analytic Combinatorics by the late Philipe Flajolet and Robert Sedgewick. It is available from the link above.

The late Herbert Wilf’s book generatingfunctionology is another free source.

And of course Concrete Mathematics by GKP is a treasure trove.

For binary trees it goes like this: First you need a clear definition of the tree.

A binary tree is a rooted tree in which every non-leaf node has degree 2 exactly.

Next we have to agree what we want to call the size of a tree.

enter image description here

On the left all nodes are equal. In the middle we distinguish the leaves and non-leaves. On the right we have a pruned binary tree where the leaves have been removed. Notice that it has unary branches of two types (left and right)!

Now we have to derive a description of how these combinatorial objects are built. In the case of binary trees a recursive decomposition is possible.

Let $\mathcal{A}$ be the set of all binary trees of the first type then symbolically we have: enter image description here

It reads as: “An object of the class of binary trees is either a node or a node followed by two binary trees.” This can be written as equation of sets:

$$\mathcal{A}=\{\bullet\}\cup\bigl(\{\bullet\}\times\mathcal{A}\times\mathcal{A}\bigr)$$

By introducing the generating function $A(z)$ that enumerates this class of combinatorial objects we can translate the set equation into an equation involving the generating function.

$$A(z)=z+zA^2(z)$$

Our choice of treating all nodes equally and taking the number of nodes in the tree as notion of its size is expressed by “marking” the nodes with the variable $z$.

We can now solve the quadratic equation $zA^2(z)-A(z)+z=0$ for $A(z)$ and get, as usual, two solutions, the explicit closed form of the generating function:

$$A(z)=\frac{1\pm\sqrt{1-4z^2}}{2z}$$

Now we simply need Newton’s (generalized) Binomial Theorem:

$$(1+x)^a=\sum_{k=0}^\infty\binom{a}{k}x^k$$

with $a=1/2$ and $x=-4z^2$ to expand the closed form of the generating function back into a power series. We do this because, the coefficient at $z^n$ is just the number of combinatorial objects of size $n$, typically written as $[z^n]A(z)$. But here our notion of “the size” of the tree forces us to look for the coefficient at $z^{2n+1}$. After a little bit of juggling with binomials and factorials we get:

$$[z^{2n+1}]A(z)=\frac{1}{n+1}\binom{2n}{n}.$$

If we start with the second notion of the size the recursive decomposition is:

enter image description here

We get a different class of combinatorial objects $\mathcal{B}$. It reads: “An object of the class of binary trees is either a leaf or a interal node followed by two binary trees.”

We can use the same approach and turn $\mathcal{B}=\{\square\}\cup\bigl(\{\bullet\}\times\mathcal{B}\times\mathcal{B}\bigr)$ into $\mathcal{B}=1+zB^2(z)$. Only this time the variable $z$ only marks the internal nodes, not the leaves, because the definition “the size“ is different here. We get a different generating function as well:

$$B(z)=\frac{1-\sqrt{1-4z}}{2z}$$

Extracting the coefficient yields

$$[z^n]B(z)=\frac{1}{n+1}\binom{2n}{n}.$$

Class $\mathcal{A}$ and $\mathcal{B}$ agree on the counts, because a binary tree with $n$ internal nodes has $n+1$ leaves, thus $2n+1$ nodes in total.

In the last case we have to work a little harder:

enter image description here

which is a description of non-empty pruned binary tries. We extend this to $$\begin{align}\mathcal{C}&=\{\bullet\}\cup\bigl(\{\bullet\}\times\mathcal{C}\bigr)\cup\bigl(\{\bullet\}\times\mathcal{C}\bigr)\cup\bigl(\{\bullet\}\times\mathcal{C}\times\mathcal{C}\bigr)\\\mathcal{D}&=\{\epsilon\}\cup\bigl(\{\bullet\}\times\mathcal{C}\times\mathcal{C}\bigr)\end{align}$$

and rewrite it with generating functions

$$\begin{align}C(z)&=z+2zC(z)+zC^2(z)\\D(z)&=1+zC^2(z)\end{align}$$

solve the quadratic equations

$$\begin{align}C(z)&=\frac{1-2z-\sqrt{1-4z}}{2z}\\D(z)&=\frac{1-\sqrt{1-4z}}{2z}\end{align}$$

and get yet again

$$[z^n]C(z)=\frac{1}{n+1}\binom{2n}{n}\quad n\ge1 \qquad [z^n]D(z)=\frac{1}{n+1}\binom{2n}{n} \quad n\ge0$$

Note that the Catalan generating function is

$$E(z)=\frac{1-\sqrt{1-4z}}{2}$$

it enumerates the class of general trees. That is the trees with no restriction on the node degree.

$$\mathcal{E}=\{\bullet\}\times\mathrm{SEQ}(\mathcal{E})$$

It reads as: “An object of the class of general trees is a node followed by a possible empty sequence of general trees.”

$$E(z)=\frac{z}{1-E(z)}$$

With the Lagrange-Bürmann Inversion Formula we get

$$[z^n]E(z)=\frac{1}{n+1}\binom{2n}{n}$$

So we proved that there are as many general trees as there are binary trees. No wonder there is a bijection between the general and binary trees. The bijection is known as the rotation correspondence (explained at the end of the linked article), that allows us two store every general tree as a binary tree.

Note that if we do not distinguish the left and right sibling in class $\mathcal{C}$ we get yet another class of trees $\mathcal{T}$:

enter image description here

the unary binary trees. $$\mathcal{T}=\{\bullet\}\times\mathrm{SEQ}_{\le2}(\mathcal{T})$$ They have a generating function too $$T(z)=\frac{1-z-\sqrt{1-2z-3z^2}}{2z}$$ however their coefficient is different. You get the Motzkin numbers $$[z^n]T(z)=\frac{1}{n}\sum_k\binom{n}{k}\binom{n-k}{k-1}.$$

Oh and if you don’t like generating functions there are plenty of other proofs too. See here, there is one where you could use the encoding of binary trees as Dyck words and and derive a recurrence from their recursive definition. Then solving that recurrence gives the answer too. However the symbolic method saves you from coming up with the recurrence in the first place, as it works directly with the blueprints of the combinatorial objects.

OTHER TIPS

Generating functions are a very powerful and very useful magic wand. The following solution to the first question (why are there $C_n$ trees) is somewhat less magical. Hence, cute.

Example. To produce a tree of $5$ nodes we start with a sequence in which $+1$ occurs $5+1$ times, and $-1$ occurs $5$ times. For example, $+-++-+--++-$. Among those prefixes with the smallest (and possibly negative) sum, pick the longest; in this case, $+-++-+--$. Take this prefix from the beginning and put it at the end; in this case, we get $++-+-++-+--$. Now change $+$ into $T$ and $-$ into $E$; in this case we get TTETETTETEE. Remove a $T$ from the beginning, add an $E$ at the end; in this case we get TETETTETEEE. This is a description of the tree T(E,T(E,T(T(E,T(E,E)),E))). Below there's some explanation of why this is a bijection. Once you're convinced of that, counting is easy. There are $\binom{5+6}{5}$ sequences of $\pm1$, then we divided by $5+6$ because we chose one of the possible cyclic permutations.

First bijection. A typical definition for trees in ML is type tree = T of tree * tree | E; that is, a tree either has two (ordered) subtrees, or it is empty. Here is how trees are constructed: T(T(E,E),T(T(E,E),T(E,E))). Dropping fluff, we can simply write TTEETTEETEE. All such descriptions will end with an E, so it is redundant: TTEETTEETE. (Note that the empty tree now corresponds to the empty string.) These strings have the property that each prefix has at least as many Ts as Es, and in total they have $n$ Ts and $n$ Es, where $n$ is the number of nodes of the tree.

Second bijection. We now replace T by +1 and E by -1. So, we are looking at sequences with $n$ values +1, with $n$ values -1, and with the sums of all prefixes $\ge0$.

Third bijection. We now change the requirement for prefixes a little: We ask that the sum of each non-empty prefix to be $>0$. For this to be possible we let $n+1$ values +1 and $n$ values -1. (Otherwise the sum of the whole string would be 0 and would fail to meet the condition for prefixes.) These sequences must begin with +1. So, they really are the same as before, except they have a +1 stuck in the beginning.

Raney property. Now forget our sequences for a moment and consider some finite sequence of integers $x_1$, $\ldots\,$, $x_m$ whose sum is 1. If all non-empty prefixes have positive sums, then no cyclic permutation of this sequence has the same property. Why? Well, suppose there is a $k\ne1$ such that $x_k,\ldots,x_m,x_1,\ldots,x_{k-1}$ also has all non-empty prefixes positive. Then $x_1+\cdots+x_{k-1}\ge1$ (property of sequence starting from $x_1$) and $x_k+\cdots+x_m\ge1$ (property of sequence starting from $x_k$); hence, $x_1+\cdots+x_m\ge2$, which contradicts the assumption that the sum for the whole sequence is 1.

Moreover, given some sequence with sum 1, there is always a cyclic permutation that makes all non-empty prefixes have positive sum. (This is true even for real numbers.)

Conclusion. Now let's count the sequences of +1 and -1 that are in a bijection with trees. Out of $2n+1$ numbers we must pick $n+1$ that equal +1, the others will be -1. There are ${2n+1\choose n+1}$ ways to do so. But, only $1$ in $2n+1$ sequences counted so far has positive prefixes. So, the number of rooted, ordered binary trees is:

$${1\over2n+1}{2n+1\choose n+1}={1\over2n+1}{2n+1\over n+1}{2n\choose n}={1\over n+1}{2n\choose n}$$

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange