memmove vs copying backwards

Question 1

Assuming a good implementation, the only "extra cost" of memmove is the initial check (an add and a compare-and-branch) to decide whether to copy front-to-back or back-to-front. This cost is so completely negligible (the add and compare will be hidden by ILP and the branch is perfectly predictable under normal circumstances) that on some platforms, memcpy is just an alias of memmove.

In anticipation of your next question ("if memcpy isn't significantly faster than memmove, why does it exist?"), there are a few good reasons to keep memcpy around. The best one, to my mind, is that some CPUs essentially implement memcpy as a single instruction (rep/movs on x86, for example). These HW implementations often have a preferred (fast) direction of operation (or they may only support copying in one direction). A compiler may freely replace memcpy with the fastest instruction sequence without worrying about these details; it cannot do the same for memmove.

Question 2

Memmove figures out for you whether to copy backward or forward; it is also highly optimized for this task (i.e. copying in SSE-optimized blocks as much as feasible).

It is unlikely that you can do any better by calling any generic STL algorithm (the best they could do is to call memcopy or memmove behind the scenes), but of course you could answer this question simply by running your code and timing it.

Question 3

from the post you actually linked (emphasis mine):

memcpy just loops, while memmove performs a test to determine which direction to loop in to avoid corrupting the data. These implementations are rather simple. Most high-performance implementations are more complicated (involving copying word-size blocks at a time rather than bytes).

Question 4

The appropriate ways to copy or move are std::copy, std::copy_n, std::copy_backward and std::move. A proper C++ library will use memcpy or memmove if applicable. Hence there is no need to go for undefined results if the copied or moved sequence holds no trivial data.

Note: Here the std::move is the template 'OutputIterator move(InputIterator first, InputIterator last, OutputIterator result);' (For @Void)