The international C++ standardization committee in 2005 published a Technical Report on C++ Performance , which I believe qualifies as both article and benchmark about this topic.
The short answer is that cache misses can influence run time considerably, and in a call of a virtual function the vtable is (usually) consulted.
But in practice (as opposed to the formal) the overhead per call in terms of executed machine code, is fixed, because all extant compiled C++ implementations use vtables. You can derive classes to your heart's content without affecting the call overhead. Any call still performs (1) look up vtable pointer in known place in object, (2) look up function address in known place in vtable, (3) call that function, unless the compiler knows that the function pointer is available from e.g. an earlier call, which just, if anything, makes the call go a wee bit faster.