You are creating and destroying many containers, and each one uses operator new
to allocate memory. On many systems, this requires synchronization to manage the free memory that is handed out on typical, small allocations like yours. So you are probably incurring quite a lot of inter-thread contention there.
You might try a different allocator, such as tcmalloc (http://goog-perftools.sourceforge.net/doc/tcmalloc.html). It is specifically designed to deal with this.
Another approach would be to use an object pool or other allocation strategy to avoid using the standard allocation mechanism completely. That would require some code changes, whereas using tcmalloc does not.