Question

I'm implementing a bitonic sort using intel TBB. When using the parallel_invoke method all goes well. But when using task_group(without calling the wait method), the output is not sorted. When using task_group as below the program does not terminate.

void bitonic_merge(bool up, int array[],int size){
  if(size==1){
      return;
  }
  int m = greatestPowerOfTwoLessThan(size);
  bitonic_compare(up, array, size - m, m);
  g->run(Bitonic_Merge(up, array , m));
  g->run(Bitonic_Merge(up, &array[m],size - m));
  g->wait();
  return;    
}

Can someone see what is wrong? What is the difference between parallel_invoke and using task_group? Also what is better to use in such a situation? The parallel_invoke or task_group? Or should I use some other method?

Thanks in advance

Was it helpful?

Solution

The program doesn't terminate because it is deadlocked. Your code is very close to correct, but the problem is that 'g' is a global pointer to a task_group and your doing a recursive task decomposition and this is not a combination that mixes well.

If you broke into the debugger, I expect that you would see lots of threads in task_group::wait, waiting for the tasks to complete.

The tasks aren't completing because you're sharing your task_group amongst the threads and tasks and they're all effectively waiting for each other.

To fix this, declare a task_group (or structured_task_group) on the stack inside the bitonic_merge function, this will still allow tasks to be scheduled and executed during the calls to wait, just like using parallel_invoke will, but because the task_group isn't shared amongst tasks, the call to wait will complete after all the child tasks have completed and avoid the deadlock.

Note that I answered a similar question with a performance slant on the msdn forums for the PPL and remember that syntax and semantics of task_group, structured_task_group, parallel_invoke and parallel_for / parallel_for_each are consistent between the PPL and TBB; use what makes sense for you or your platform.

OTHER TIPS

Use tbb::parallel_invoke if the number of sub problems is constant. otherwise use recursion and task_group. Since the number of sub problems is 2 parallel_invoke is suitable and easier to implement.

Refer Intel TBB design patterns for more details

Waiting on the task group is important here. Without wait(), the function will return before the recursive "calls" done with task_group::run() complete, and obviously it breaks the algorithm. parallel_invoke is indeed applicable, and it automatically waits for the "invoked" functions to complete, so easier to use. What makes me (as TBB developer) worry is why the given program snippet does not terminate. It should work well as far as I can tell. Would you mind submitting a full source of the program (either here or at the TBB forum?)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top