Counting nodes in a Tree

Question

As the name implies, foldTree defines the fold function over the custom Tree type.

A naive way of defining a foldTree could be:

let rec foldTreeNaive accFun init = function
    | Empty -> init
    | Branch (x, left, right) ->
        let lacc = foldTreeNaive accFun init left
        let racc = foldTreeNaive accFun init right
        accFun x lacc racc

The problem with this function is that it could potential make very deep recursive calls if the tree being folded over is deep, since the recursive calls must complete for a node before the accumulator function can be called. For example the following causes a stack overflow exception:

let unbalanced = [1..100000] |> List.fold (fun t i -> Branch(i, t, Empty)) Empty
let nodeCount = foldTreeNaive (fun _ lc rc -> lc + rc + 1) 0 unbalanced

The usual way to avoid such stack overflows is to make the function tail recursive, however this seems impossible in this case since there are two recursive calls to make, instead of the one required when folding over lists.

foldTree is defined using the local loop function. This function is interesting in that it is defined using continuation passing style. In CPS, each function takes an additional 'continuation' function which is passed the result of the computation and is responsible for deciding what happens next. Note that loop is tail recursive and so avoids the overflow problem of foldTreeNaive.

The type of the loop function is:

Tree<'a> -> ('b -> 'c) -> 'c

where 'a is the type of nodes in the tree, 'b is the accumulator type, and 'c is the result of the continuation function.

In the case of a leaf node, the continuation is passed the empty accumulator value passed to the foldTree function.

When folding over a non-empty tree in the Branch case, the result of the fold depends on the results for the left and right subtrees. This is done recursively, first by folding over the left subtree, then the right. For the recursive call over the left subtree, loop must build a new continuation to receive the result, this is the

(fun lacc -> 
    loop right (fun racc ->
        cont (branchF x lacc racc))

function. What this continuation does is to make a recursive call over the right subtree, passing yet another continuation to receive the result of that fold. When that continuation is called, the results for the left and right subtrees are available in lacc and racc. At this point the accumulation function for the node can be called with the value for the current node and the results for the left and right subtrees. The result of this function is then passed to the original continuation passed to loop.

The loop function is then invoked by the foldTree function in the line:

loop t id

Here, id is the continuation which will receive the result of the fold for the root node of the tree. Since this is the value required, id just returns its argument without modification.

You might also find this description of fold for binary trees useful.