Question

I want to use tbb (parallel_for pattern) for concurrently convolution a large amount of images - each processor's core convolves single image. However, image's depth varies: either monograyscale(1-channel), or stereograyscale(2-channel), or monorgb(3-channel), or stereorgb(6-channel), etc.

It turns out that the workloads on different threads(cores) keeps changing. How to use correctly parallel_for in this task, or I should consider other parallel patterns?

Was it helpful?

Solution

The tbb::parallel_for of the form parallel_for(first,last,lambda) does some load balancing. You might try it first. Though it has a heuristic for guessing a good grainsize that can be fooled on occasion.

For best load balancing, possibly at the expense of extra per-iteration overhead, use a range-based tbb::parallel_for with a grainsize of 1 and a simple_partitioner. That forces each iteration to run as a separate task, thus giving the TBB runtime maximum flexibility to rebalance load. Below is a sample that executes 100 iterations, each with a random delay.

#include <tbb/parallel_for.h>
#include <unistd.h>

int main( int argc, char* argv[] ) {
    tbb::parallel_for(
        tbb::blocked_range<int>(0,100,1),  // Interval [0,100) with grainsize==1
        [&](tbb::blocked_range<int> r) {
            for( int i=r.begin(); i!=r.end(); ++i ) {
                printf("%d\n",i);
                usleep(random()%1000000);
            }
        },
        tbb::simple_partitioner());
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top