Assuming a good implementation, the only "extra cost" of memmove
is the initial check (an add and a compare-and-branch) to decide whether to copy front-to-back or back-to-front. This cost is so completely negligible (the add and compare will be hidden by ILP and the branch is perfectly predictable under normal circumstances) that on some platforms, memcpy
is just an alias of memmove
.
In anticipation of your next question ("if memcpy isn't significantly faster than memmove, why does it exist?"), there are a few good reasons to keep memcpy
around. The best one, to my mind, is that some CPUs essentially implement memcpy as a single instruction (rep/movs
on x86, for example). These HW implementations often have a preferred (fast) direction of operation (or they may only support copying in one direction). A compiler may freely replace memcpy
with the fastest instruction sequence without worrying about these details; it cannot do the same for memmove
.