Question

While writing up examples for memoization and continuation passing style (CPS) functions in a functional language, I ended up using the Fibonacci example for both. However, Fibonacci doesn't really benefit from CPS, as the loop still has to run exponentially often, while with memoization its only O(n) the first time and O(1) every following time.

Combining both CPS and memoization has a slight benefit for Fibonacci, but are there examples around where CPS is the best way that prevents you from running out of stack and improves performance and where memoization is not a solution?

Or: is there any guideline for when to choose one over the other or both?

Was it helpful?

Solution

On CPS

While CPS is useful as an intermediate language in a compiler, on the source language level it is mostly a device to (1) encode sophisticated control flow (not really performance-related) and (2) transform a non-tail-call consuming stack space into a continuation-allocating tail-call consuming heap space. For example when you write (code untested)

let rec fib = function
  | 0 | 1 -> 1
  | n -> fib (n-1) + fib (n-2)

let rec fib_cps n k = match n with
  | 0 | 1 -> k 1
  | n -> fib_cps (n-1) (fun a -> fib_cps (n-2) (fun b -> k (a+b)))

The previous non-tail-call fib (n-2), which allocated a new stack frame, is turned into the tail-call fib (n-2) (fun b -> k (a+b)) which allocates the closure fun b -> k (a+b) (on the heap) to pass it as argument.

This does not asymptotically reduce the memory usage of your program (some further domain-specific optimizations might, but that's another story). You're just trading stack space for heap space, which is interesting on systems where stack space is severely limited by the OS (not the case with some implementations of ML such as SML/NJ, which track their call stack on the heap instead of using the system stack), and potentially performance-degrading because of the additional GC costs and potential locality decrease.

CPS transformation is unlikely to improve performances much (though details of your implementation and runtime systems might make it so), and is a generally applicable transformation that allows to avoid the snark "Stack Overflow" of recursive functions with a deep call stack.

On Memoization

Memoization is useful to introduce sharing of subcalls of recursive functions. A recursive function typically solves a "problem" ("compute fibonacci of n", etc.) by decomposing it into several strictly simpler "sub-problems" (the recursive subcalls), with some base cases for which the problem is solvable right away.

For any recursive function (or recursive formulation of a problem), you can observe the structure of the subproblem space. Which simpler instances of Fib(k) will Fib(n) need to return its result? Which simpler instances will those instances in turn need?

In the general case, this space of subproblem is a graph (generally acyclic for termination purposes): there are some nodes that have several parents, that is several distinct problems for which they are subproblems. For example, Fib(n-2) is a subproblem both of Fib(n) and Fib(n-2). The amount of node sharing in this graph depends on the particular problem / recursive functions. In the case of Fibonacci, all nodes are shared between two parents, so there is a lot of sharing.

A direct recursive call without memoization will not be able observe this sharing. The structure of the calls of a recursive function is a tree, not a graph, and shared subproblems such as Fib(n-2) will be fully visited several times (as many as there are paths from the starting node to the subproblem node in the graph). Memoization introduces sharing by letting some subcalls return directly with "we've already computed this node and here is the result". For problems that have a lot of sharing, this can result in dramatic reduction of (useless) computation: Fibonacci moves from exponential to linear complexity when memoization is introduced -- note that there are other ways to write the function, without using memoization but a different subcalls structure, to have a linear complexity.

let rec fib_pair = function
  | 0 -> (1,1)
  | n -> let (u,v) = fib_pair (n-1) in (v,u+v)

The technique of using some form of sharing (usually through large tables storing the results) to avoid useless duplication of subcomputations is well-known in the algorithmic community, it is called Dynamic Programming. When you recognize that a problem is amenable to this treatment (you notice the sharing among the subproblems), this can provide large performance benefits.

Does a comparison make sense?

The two seems mostly independent of each other.

There are a lot of problems where memoization is not applicable, because the subproblem graph structure does not have any sharing (it is a tree). On the contrary, CPS transformation is applicable for all recursive functions, but does not by itself lead to performance benefits (other than potential constant factors due to the particular implementation and runtime system you're using, though they're likely to make the code slower rather than faster).

Improving performances by inspecting non-tail contexts

There are optimizations technique that is related to CPS that can improve performance of recursive functions. They consist in looking at the computations "left to be done" after the recursive calls (what would be turned into a function in straightforward CPS style) and finding an alternate, more efficient representation for it, that does not result in systematic closure allocation. Consider for example:

let rec length = function
  | [] -> 0
  | _::t -> 1 + length t

let rec length_cps li k = match li with
  | [] -> k 0
  | _::t -> length_cps t (fun a -> k (a + 1))

You can notice that the context of the non-recursive call, namely [_ + 1], has a simple structure: it adds an integer. Instead of representing this as a function fun a -> k (a+1), you may just store the integer to be added corresponding to several application of this function, making k an integer instead of a function.

let rec length_acc li k = match li with
  | [] -> k + 0
  | _::t -> length_acc t (k + 1)

This function runs in constant, rather than linear, space. By turning the representation of the tail contexts from functions to integers, we have eliminated the allocation step that made memory usage linear.

Close inspection of the order in which the additions are performed will reveal that they are now performed in a different direction: we are adding the 1's corresponding to the beginning of the list first, while the cps version was adding them last. This order-reversal is valid because _ + 1 is an associative operation (if you have several nested contexts, foo + 1 + 1 + 1, it is valid to start compute them either from the inside, ((foo+1)+1)+1, or the outside, foo+(1+(1+1))). The optimization above can be used for all such "associative" contexts around a non-tail-call.

There are certainly other optimizations available based on the same idea (I'm no expert on such optimizations), which is to look at the structure of the continuations involved and represent them under a more efficient form than functions allocated on the heap.

This is related to the transformation of "defunctionalization" that changes the representation of continuations from functions to data-structures, without changing the memory consumption (a defunctionalized program will allocate a data node when a closure would have been allocated in the original program), but allows to express the tail-recursive CPS version in a first-order language (without first-class functions) and can be more efficient on systems where data structures and pattern-matching is more efficient than closure allocation and indirect calls.

type length_cont =
  | Linit
  | Lcons of length_cont

let rec length_cps_defun li k = match li with
  | [] -> length_cont_eval 0 k
  | _::t -> length_cps_defun t (Lcons k)
and length_cont_eval acc = function
  | Linit -> acc
  | Lcons k -> length_cont_eval (acc+1) k

let length li = length_cps_defun li Linit

type fib_cont =
  | Finit
  | Fminus1 of int * fib_cont
  | Fminus2 of fib_cont * int

let rec fib_cps_defun n k = match n with
  | 0 | 1 -> fib_cont_eval 1 k
  | n -> fib_cps_defun (n-1) (Fminus1 (n, k))
and fib_cont_eval acc = function
  | Finit -> acc
  | Fminus1 (n, k) -> fib_cps_defun (n-2) (Fminus2 (k, acc))
  | Fminus2 (k, acc') -> fib_cont_eval (acc+acc') k

let fib n = fib_cps_defun n Finit

OTHER TIPS

One benefit of CPS is error handling. If you need to fail you just call your failure method.

I think the biggest situation is when you are not talking about calculations, where memoization is great. If you are instead talking about IO or other operations, the benefits of CPS are there but memoization doesn't work.

As to an instance where CPS and memoization are both applicable and CPS is better, I am not sure since I consider them different pieces of functionality.

Finally CPS is a bit diminished in F# since tail recursion makes the whole stack overflow thing a non-issue already.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top