Replacing TBB parallel_for with OpenMP

https://stackoverflow.com/questions/20907962

24-09-2022
|

Question

I'm trying to come up with an equivalent replacement of an Intel TBB parallel_for loop that uses a tbb::blocked_range using OpenMP. Digging around online, I've only managed to find mention of one other person doing something similar; a patch submitted to the Open Cascade project, wherein the TBB loop appeared as so (but did not use a tbb::blocked_range):

tbb::parallel_for_each (aFaces.begin(), aFaces.end(), *this);

and the OpenMP equivalent was:

int i, n = aFaces.size();
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i)
    Process (aFaces[i]);

Here is TBB loop I'm trying to replace:

tbb::parallel_for( tbb::blocked_range<size_t>( 0, targetList.size() ), DoStuff( targetList, data, vec, ptr ) );

It uses the DoStuff class to carry out the work:

class DoStuff
{
private:
    List& targetList;
    Data* data;
    vector<things>& vec;
    Worker* ptr;

public:
    DoIdentifyTargets( List& pass_targetList, 
                       Data* pass_data, 
                       vector<things>& pass_vec, 
                       Worker* pass_worker) 
        : targetList(pass_targetList), data(pass_data), vecs(pass_vec), ptr(pass_worker)
    {
    }

    void operator() ( const tbb::blocked_range<size_t> range ) const
    {
        for ( size_t idx = range.begin(); idx != range.end(); ++idx )
        {
            ptr->PerformWork(&targetList[idx], data->getData(), &Vec);
        }
    }
};

My understanding based on this reference is that TBB will divide the blocked range into smaller subsets and give each thread one of the ranges to loop through. Since each thread will get its own DoStuff class, which has a bunch of references and pointers, meaning the threads are essentially sharing those resources.

Here's what I've come up with as an equivalent replacement in OpenMP:

int index = 0;
#pragma omp parallel for private(index)
for (index = 0; index < targetList.size(); ++index)
{
    ptr->PerformWork(&targetList[index], data->getData(), &Vec);
}

Because of circumstances outside of my control (this is merely one component in a much larger system that spans +5 computers) stepping through the code with a debugger to see exactly what's happening is... Unlikely. I'm working on getting remote debugging going, but it's not looking very promising. All I know for sure is that the above OpenMP code is somehow doing something differently than TBB was, and expected results after calling PerformWork for each index are not obtained.

Given the information above, does anyone have any ideas on why the OpenMP and TBB code are not functionally equivalent?

Solution

Following Ben and Rick's advice, I tested the following loop without the omp pragma (serially) and obtained my expected results (very slowly). After adding the pragma back in, the parallel code also performs as expected. Looks like the problem was either in declaring the index as private outside of the loop, or declaring numTargets as private inside the loop. Or both.

    int numTargets = targetList.size();
    #pragma omp parallel for
    for (int index = 0; index < numTargets; ++index)
    {
        ptr->PerformWork(&targetList[index], data->getData(), &vec);
    }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow