Using continuation / CPS to implement tail-recursive MergeSort in OCaml

Question 1

Adding another answer to make a separate point: it seems that much of the confusion among answerers is caused by the fact that you don't use the standard OCaml compiler, but the TryOCaml website which runs a distinct OCaml backend, on top of javascript, and has therefore slightly different optimization and runtime characteristics.

I can reliably reproduce the fact that, on the TryOCaml website, the CPS-style function mergeSort_cps you show fails on lists of length 1_000_000 with the following error:

Exception: InternalError: too much recursion.

My analysis is that this is not due to a lack of tail-rec-ness, but by a lack of support, on the Javascript backend, of the non-obvious way in which the CPS-translated call is tailrec: recursion goes through a lambda-abstraction boundary (but still in tail position).

Turning the code in the direct, non-tail-rec version makes the problem go away:

let rec merge_sort compare = function
  | [] -> []
  | [hd] -> [hd]
  | l ->
    let (left, right) = split_list (List.length l / 2) l in
    merge compare (merge_sort compare left) (merge_sort compare right);;

As I said in my other answer, this code has a logarithmic stack depth, so no StackOverflow will arise from its use (tail-rec is not everything). It is simpler code that the Javascript backend handles better.

Note that you can make it noticeably faster by using a better implementation split (still with your definition of merge) that avoids the double traversal of List.length then splitting:

let split li =
  let rec split ls rs = function
    | [] -> (ls, rs)
    | x::xs -> split rs (x::ls) xs in
  split [] [] li;;

let rec merge_sort compare = function
  | [] -> []
  | [hd] -> [hd]
  | l ->
    let (left, right) = split l in
    merge compare (merge_sort compare left) (merge_sort compare right);;

Question 2

Reading the comments, it seems that your Stack_overflow error is hard to reproduce.

Nevertheless, your code is not entirely in CPS or tail-recursive: in merge_sort, the calls to split_list and merge are made in a non-tail-call position.

The question is: by making your CPS transform and generous use of accumulators, what will be the worst stack depth related to recursion? Saving stack depth on the sort calls is in fact not very interesting: as each split the list in two, the worst stack depth would be O(log n) for n the size the input list.

On the contrary, split and merge would have made a linear O(n) usage of the stack if they weren't written in accumulator-passing style, so they are important to make tail-rec. As your implementation of those routines is tail-rec, there should be no need to worry about stack usage, and neither to convert the sort routine itself in CPS form that makes the code harder to read.

(Note that this logarithmic-decrease argument is specific to mergesort. A quicksort can have linear stack usage in worst case, so it could be important to make it tail-rec.)