You should use a Thread Pool instead. This means spawning just enough threads to get work done without undue contention (for example you might spawn something like N-2 threads on an N-core machine, but perhaps more if some work may block on I/O).
There is not exactly a thread pool in Boost, but there are the parts you need to build one. See here for some ideas: boost::threadpool::pool vs.boost::thread_group
Or you can use a more ready-made solution like this (though it is a bit dated and perhaps unmaintained, not sure): http://threadpool.sourceforge.net/
Then the idea is to spawn the N threads, and then in your loop for each task, just "post" the task to the thread pool, where the next available worker thread will pick it up.
By doing this, you will avoid many problems, such as running out of thread stack space, avoiding inefficient resource contention (look up the "thundering herd problem"), and you will be able to easily tune the aggressiveness with which you use multiple cores on any system.