tail. prefix in ILAsm – any example of use?

Question

You are not going to find it in any code produced by the current MS C# compiler. You will find it in code produced from the F# compiler, but not as much as you might expect, for almost opposite reasons.

Now, to first correct one mistake in your statement:

ECMA-335, III.2.4 specifies tail. prefix that can be used in recursive functions.

This is not strictly true. The tail. prefix can be used in tail-called calls; not all recursive functions are tail-recursion, and not all tail-calls are part of recursion.

A tail call is any call to a function (including an OOP method) where the last operation in that code-path is to make that call and then return the value it returns, or just return if the function called doesn't return a value. Hence in:

int DoSomeCalls(int x)
{
  if(A(x))
    return B(x);
  if(DoSomeCalls(x * 2) > 3)
  {
    int ret = C(x);
    return ret;
  }
  return D(DoSomeCalls(x-1));
}

Here, the calls to B, and D are tail calls, because the only thing done after the call is to return the value they'd returned. The call to C isn't a tail call, but it can be easily converted to one by removing the redundant assignment to ret by just returning directly. The call to A is not a tail call, and the nor are the call to DoSomeCalls, though they are recursive.

Now, the normal function call mechanism is implementation-dependent, but generally involves saving enrigstered values that might be needed after the call onto the stack, putting parameters onto the stack and/or into registers along with the current instruction position (to return to), moving the instruction pointer, and then reading the return value from a register or the stack when the instruction pointer is moved back to after where the call was done. With a tail call it's possible to skip a lot of this, because the called-into function can use the current stack frame and then return straight to the earlier caller.

The tail. prefix requests that this be done with a call.

While this isn't necessarily related to recursion, you were correct in talking about recursion, because the benefits of eliminating tail calls is greater in recursive cases than otherwise; making calls that are O(n) in stack space when actually using the function-call mechanism become O(1) in stack-space, along with reducing the per-item constant time costs lower (so it's still O(n) in this regard, but O(n) time means it takes n×k seconds, and we have a smaller k). In many cases this can be the difference between a call that works, and a call that throws a StackOverflowException.

Now, in ECMA-335 there are a few cases stated about how tail. may not always be honoured. In particular there is the text in §III.2.4 that states:

There can also be implementation-specific restrictions that prevent the tail. prefix from being obeyed in certain cases.

At its loosest, we could interpret this as preventing it in all manner of cases.

Conversely, the jitter is allowed to apply all manner of optimisations, including performing tail call elimination even when it wasn't requested by tail.

Because of this, there are in fact four ways to do tail-call elimination in IL:

Use the tail. prefix just before the call, and have it honoured (not guaranteed).
Don't use the tail. prefix before the call, but have the jitter decide to apply it any way (even less guaranteed).
Use the jmp IL instruction which is effectively a special case of tail call elimination (never used by C# because it produces unverifiable code for a normally relatively small gain, though it can be the easiest approach sometimes when hand-coding due to its relative simplicity).
Re-write the whole method to use a different approach; in particular the sort of recursive code that most benefits from tail call elimination can be re-written to explicitly use the sort of iterative algorithm the tail-call elimination effectively turns the recursion into.* (In other words, the tail-call elimination happens before the jitting or even the compilation).

(There's also sort of the case where the call is inlined, since it doesn't require a new stack frame, and indeed has normally a stronger improvement overall, and then in turn often allows even further optimisations to be performed, but it isn't generally considered a case of tail-call elimination because it's a call elimination that doesn't depend on it being a tail call).

Now, the first implementations of the jitter tended not to do tail call elimination in a lot of cases, even if it was requested.

Meanwhile at the C# side of things, there was a decision not to emit tail. There is a general approach with C# of not heavily optimising the code produced. There are some optimisations done (in particular, dead code removal), but for the most part since the optimisation efforts could just duplicate those done by the jitter (or even get in their way) the downsides of optimisation (more complications means more possible bugs, and the IL would be more confusing to many developers) relatively outweigh the upsides. Use of tail. is a classic example of this, because sometimes insisting on tail calls actually costs more than it saves with .NET so if the jitter is already trying to work out when it's a good idea, then there's a bigger chance that the C# compiler would be just making things worse a lot of the time, and making no difference the rest.

It's also worth noting that with the styles of coding most common with a C-style language like C#:

Developers tend not to write code that would particularly benefit from tail-call elimination compared to the styles more common in other languages.
Developers tend to know how to optimise the sort of recursive calls that would most benefit from tail-call elimination by re-writing them to be iterative.
Developers tend to have written them in the iterative manner in the first place.

Now, along came F#.

With the sort of functional and declarative programming F# encourages, there are a lot of cases where what is most naturally done in an iterative way in C# is most naturally done with a recursive approach. Where people hacking in C-style languages learn to turn recursive cases into iterative code, people hacking in F#-style languages learn to turn iterative cases into recursive code, and non-tail-calling recursive code into tail-calling recursive code.

So F# used tail. a lot.

And it got StackOverflowException a lot, because the jitter wasn't honouring it.

This is one of the things that led the jitter people to increase the number of cases where they eliminated tail calls, both in general and even further if tail. is used.

Meanwhile, the F# people couldn't just depend on tail. so F#'s compiler will optimise much more heavily than C#'s; just as we can manually rewrite recursive calls to be iterative as in the footnote, so the F# compiler does the equivalent when producing IL.

And for this reason, a lot of the time when you write an F# method where you'd expect to see some IL that uses tail., what you'd actually get is IL that does the equivalent thing iteratively.

However, F# will still use tail. when a method calls another method in a mutually-recursive manner like:

let rec even n = 
  if n = 0 then 
    true 
  else
    odd (n-1)
and odd n =
  if n = 1 then 
    true 
  else
    even (n-1)

Which I totally stole from this answer because I've only played a tiny bit with F# so I'd rather depend upon someone more familiar than I am.

In this case, because the tail-calls aren't in a single function, it can't just be rewritten to eliminate it at the IL compilation point, so it has to hope the jitter will do the elimination, and uses tail. to increase the chances it will.

*An example of turning a recursive call into an iterative would be to start with a recursive call like:

void ClearAllNodes(Node node)
{
  if(node != null)
  {
    node.Value = null;
    ClearAllNodes(node.Next)
  }
}

The simplest change is to then manually add what a tail-call elimination does, by ourselves setting up the parameter, and jumping back to the start of the method:

void ClearAllNodes(Node node)
{
start:
  if(node != null)
  {
    node.Value = null;
    node = node.Next;
    goto start;
  }
}

Since there are good reasons to avoid goto if we can, we would generally change it to something that does the same through more strictly-defined looping mechanisms:

void ClearAllNodes(Node node)
{
  while(node != null)
  {
    node.Value = null;
    node = node.Next;
  }
}