It might be a good idea to consult a particular processor specification, but from the general point of view tail call elimination improves performance because
- no new stack frame is allocated
- the nested call retuns directly to the caller of the current call rather than to the current call and only then to the original caller
- less memory allocation means better use of CPU cache
- less memory allocation means better use of main memory that may need to be paged or loaded when the stack is too large
The modern processors can reduce the overhead caused by these operations though.