I will try to answer this first part of the question:
Could somebody please comment on whether this seems sensible? To me it doesn't- how can there be no branch misprediction or instruction cache miss statistics for a line of polymorphic code where the branch target will constantly be changing per message?
This cannot be due to compiler optimizations/inlining because the compiler wouldn't know the subtype of the object to optimize.
There is actually a way for a compiler to inline calls to virtual functions, it's kind of an interesting trick and I was surprised when I learnt about it.
You can watch this Eric Brumer's talk for more details, starting from 22:30 min mark he talks about indirect calls optimization.
Basically, instead of issuing a simple jump instruction to that virtual function pointer, compiler adds some comparisons first, and for some known values of pointers predict the specific virtual function called, and then that call can be inlined inside that branch. In that case the unpredictable pointer value jump turns into a simple comparison branch prediction, and modern CPUs are good at that. So if most of the calls are going to be into the same specific virtual function implementation, you may see good prediction numbers and low instruction cache miss numbers.
I'd recommend looking into dis-assembly for that function call. Does it honestly jump to the code using vtable pointers indirection, or does it avoid vtable jump via some optimization.
If the call is not optimized by compiler there's still some way for a CPU to speculate, dig into Branch Target Buffer. For example, if this function is called in a tight loop on the object of the same type, then it may not matter if it's virtual or not, its address may be predicted...
HTH.