Question

I have a serial application that I parallelized using OpenMP. I simply added the following to my main loop :

#pragma omp parallel for default(shared)
for (int i = 0; i < numberOfEmitters; ++i)
{
    computeTrajectoryParams* params = new computeTrajectoryParams;
            // defining params...
    outputs[i] = (int*) ComputeTrajectory(params);

    delete params;
}

It seems to work well : at the beginning, all my worker threads execute an iteration of the loop, everything goes fast, and I have a 100% CPU load (on a quad-core machine). However, after a moment, one of the worker thread stops, and stays in a function called _vcomp::PersistentThreadFunc from vcomp90.dll (the file is vctools\openmprt\src\ttpool.cpp), and then another, etc... until only the main thread remains working.

Does anybody have an idea why this happens ? This starts to happen after about half of the iterations have been executed.

Was it helpful?

Solution

It might depend on the scheduling scheme, and the computation size in each cycle. If the scheduling is static - each thread is assigned with work before it is run. Each thread will get 1/4 of the indexes. It is possible that some threads finish before others because their work is easier than that of other threads (or maybe they are just less loaded with other things).

Try to work with dynamic scheduling, and see if it works better.

OTHER TIPS

Little comment on your code: If your ComputeTrajectory's execution time is measured in ms and you have more than a few iterations, you should really make sure you have a memory allocator that is MP optimized, because you allocate in each iteration and (still today) most allocators have a global pool with a global lock.

You could also look into getting the allocation out of the loop entirely, but there is not enough info to know if it is possible here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top