Using task_group::wait
method should be faster (as you don't have to lock/unlock every time) and it may work as you expect.
This method blocks the current task until the tasks of another task group have completed their work.
See MSDN: Parallel Tasks.
Update: I have run some timing tests and seems that this is not a solution (besides both fail on large data inputs on my Dual-Core). This can be a bug of "design' in concurrent_vector" as in Intel's TBB - tbb::concurrent_vector returns wrong size