Tail call conversion in OCaml

https://stackoverflow.com/questions/11380832

19-06-2021
|

Question

I was told in class that the following function is not tail recursive due to the boolean operator being evaluated after the recursive call:

let rec exists p = function
    [] -> false
  | a::l -> p a || exists p l

But this doesn't blow the stack on a ten-million size list, and what's more, it is the implementation in the standard library. If it weren't tail recursive, there would be no reason to use this form instead of the seemingly equivalent and clearly tail recursive

let rec exists p = function
    [] -> false
  | a::l -> if p a then true else exists p l

so it seems like the OCaml compiler is capable of optimizing boolean ops in simple cases like this to take advantage of tail recursion. But I noticed that if I switch the order of operands like so

let rec exists p = function
    [] -> false
  | a::l -> exists p l || p a

then the stack is indeed blown on 10m elements. So it looks like OCaml is only able to take advantage of this when the recursive call appears on the right, which makes me suspect that all the compiler does is replace the boolean op with the equivalent if expression. Can someone confirm or refute this?

Solution

The person who told you this was wrong.

In fact, || is not translated into an if/then/else right away, but preserved a bit through the intermediate language of the compiler, to easily enable two different transformations:

as you said, a || b in expression position is translated into if a then true else b
but a || b in test position, that is, if a || b then c else d is translated differently, into something like if a then goto c else if b then goto c else d, when goto c is a jump to the computation of c (just translating into if a then c else if b then c would duplicate the code of c). This optimization is more arcane and the users don't need to be aware of it to reason about the performance of their programs.

You can see for yourself in the sources of the compiler. The || primitive is represented as Psequor, and the files of interest are asmcomp/cmmgen.ml for native compilation ((1), (2)]), and bytecomp/bytegen.ml for the bytecode compilation (both aspects are handled at the same time, by instruction of the bytecode produced to use the result).

A small point: you seem to say that OCaml is able to optimize a tail-call "on the right" because this case is "simple enough", but not "on the left" because the compiler is not clever enough. If the call appears on the left, it is not a tail call, so it must not be optimized. This is not a question of being a "simple" tail call or not.

Finally, if you want to check whether a tail is tail-call or not, you can use OCaml tools for that: compiling with the -annot option will produce an annotation file foo.annot (if your source was foo.ml) that has information about the types of program expressions and, for function calls, about whether they're tail-calls or not. With the caml-mode in Emacs for example, the command M-x caml-types-show-call pointed about the exists after the || will confirm me that this is a "tail call", while when called on p x it returns "stack call".

OTHER TIPS

If one wrote:

let rec add_result p = function
  [] -> 0
| a::l -> p a + add_result p l

This won't be tail recursive, because after the recursive call the function has to add both results.

But || is not a normal operator, and A || B is strictly equivalent to if A then true else B, so when you wrote

let rec exists p = function
  [] -> false
| a::l -> p a || exists p l

it's the same than

let rec exists p = function
  [] -> false
| a::l -> if p a then true else exists p l

and the function is tail recursive.

let rec exists p = function
  [] -> false
| a::l -> exists p l || p a

is equivalent to

let rec exists p = function
  [] -> false
| a::l -> if exist p l then true else p a

and this is not tail recursive.

Remi's answer is completely correct. Note that some languages with different typing systems than OCaml automatically coerce certain non-boolean values into booleans. These languages have a choice with operators like (||): either don't try to coerce the result of the rhs into a boolean, but just return whatever is given, or do coerce but then you have some work to do after the rhs is evaluated, so you give up on the tail-recursiveness of (||). You can't have both. Perhaps your informant was thinking along those lines and that's why they mistakenly said what they did about OCaml. Given OCaml's strict handling of types, you can't say things like true || succ 5 in the first place.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow