Why does TCO require support from the VM?

Question 1

Inlining is not a solution to the general tail call elimination problem for a number of reasons. The list below is not meant to be exhaustive. It is, however, sorted -- it starts with an inconvenience and ends with a complete showstopper.

A function called in a tail position may be a large one, in which case inlining it may not be advisable from a performance standpoint.
Suppose there are multiple tail calls to g in f. Under the regular definition of inlining, you'd have to inline g at each call site, potentially making f huge. If instead you choose to goto the beginning of g and then jump back, then you need to remember where to jump to and all of a sudden you're maintaining your own call stack fragment (which will almost certainly exhibit poor performance when compared to the "real" call stack).
For mutually recursive functions f and g, you'd have to inline f in g and g in f. Clearly this is impossible under the usual definition of inlining. So, you're left with what's effectively a custom in-function calling convention (as in the goto-based approach to 2. above).
Real TCE works with arbitrary calls in tail position, including in higher-order contexts:
```
(defn say-foo-then-call [f]
  (println "Foo!")
  (f))
```
This can be used to great effect in certain scenarios and clearly cannot be emulated with inlining.

Question 2

It can be surprising, and may make debugging harder, since you can't see the call stack.
It only works in very special cases, rather than the general cases that are handled when the VM supports TCO.
Programmers often don't write their code tail-recursively, unless the language gives them incentive to do so. E.g. recursive factorial is usually written with the recursion step as n * fact(n-1), which is not tail-recursive.

Question 3

TCO per se doesn't require VM support. That is to say, not for local functions. Tail calls spanning external functions require VM support. Ideally, a complete implementation of tail recursion allows functions in separately compiled program units to mutually recurse in constant space, not only functions that are local to one parent function, or functions that are all visible to the compiler at once.

In a VM without tail call support, function calls are encapsulated, and allocate a new frame on exit. Tail calls require a special entry point which bypasses that. Functions can participate in tail calling as well as non-tail calling, so they require both entry points.

Tail call elimination can be simulated without VM support using non-local returns and dispatch. That is to say, when a tail call takes place syntactically, it is translated to a non-local return which abandons the current function via dynamic control transfer, passing the arguments (packaged as an object, perhaps) to a hidden dispatch loop, which transfers control to the target function, passing it those arguments. This will achieve the requirement that recursion takes place in constant space and will "look and feel" like tail calling. However, it is slow, and perhaps not entirely transparent.