C++ push_back vs Insert vs emplace

Question 1

Here's a general principle: when a library provides both do_x_once and do_x_in_batch, then the latter should be at least as fast as calling do_x_once in a simple loop. If it isn't, then the library is very badly implemented since a simple loop is enough to get a faster version. Often, such batch functions/methods can perform additional optimizations because they have knowledge of data structure internals.

So, insert should be at least as fast as push_back in a loop. In this particular case, a smart implementation of insert can do a single reserve for all the elements you want to insert. push_back would have to check the vector's capacity every time. Don't try to outsmart the library :)

Question 2

I guess it really depends on the compiler (library implementation), compiling options and architecture. Doing a quick benchmark in VS2005 without optimization (/Od) on Intel Xeon:

std::vector<int> a;
std::vector<int> b;

// fill 'a' with random values for giggles

timer.start()
// copy values from 'a' to 'b'
timer.stop()

I get these results for 10 000 000 items using these different methods of "copy values...":

Reserve space for 'b', then for-loop using b.push_back(a[i]);: 0.808 sec
Resize 'b', then for-loop using indices assignment b[i] = a[i];: 0.264 sec
No re-sizing 'b', just b.insert(b.end(), a.begin(), a.end());: 0.021 sec (no significant difference with reserve first)
std::copy(a.begin(), a.end(), std::back_inserter(b));: 0.944 sec (0.871 with reserve first)
Resize 'b', then memcopy on the base pointers memcpy(&(b[0]), &(a[0]), 10000000*sizeof(int));: 0.061 sec

With optimizations turned on (/Ox) however, it's a different story. I had to increase the size to 100 000 000 to get more differentiation:

push_back loop: 0.659 sec
index loop: 0.482 sec
insert: 0.210 sec (no significant difference with reserve first)
std::copy: 0.422 sec with reserve first. Got a bad_alloc without it.
memcpy: 0.329 sec

What's interesting to note is that with or without optimizations, the insert method scaled linearly. Other methods were clearly inefficient without optimizations but still couldn't get quite as fast with them. As James Kanze noted, it's different on g++. Run a test with your own platform to validate.

Question 3

As larsmans says, the more you do in a single library call, the more likely it is to be more efficient. In the case of insert into a vector, the library will normally do at most a single re-allocation, and to copy each shifted element at most once. If you use a loop and push_back, it could reallocate several times, which could be significantly slower (like an order of magnitude).

Depending on the type, however, it may also be faster to do something like:

a.resize( 300 );
std::copy( b.begin(), b.end(), a.end() - 300 );

I've found this to be faster for simple scalar types (like int) using g++ on an Intel machine.