Question

I am trying to create a code of an iterative process while keeping the while inside the parallel region to minimize the parallelization overhead.

the code is something like this The problem is it never exits, so if possible i would like your thoughts about this

#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv)
{
    float error = 20;
#pragma omp parallel shared(error)
    {
        while (error > 5)
        {
#pragma omp for  reduction(-:error)
            for (int i=0; i<10; ++i)
            {
                error -= 1;
            }
        }

    }
    fprintf(stderr, "Program terminated\n");
    return 0;
}
Was it helpful?

Solution

this is a funny little problem. I have no overwhelming openmp experience, but after some experiments with your code I think the problem is caused by the lack of synchronisation when entering the parallel for loop (insert write statements to "watch" your code).

You can get the code to work by inserting a barrier just before the parallel for loop:

#pragma omp barrier
#pragma omp for reduction(-:error)
   for(int i=0; i<10; ++i)

Without that barrier and running on 2 threads, one thread will enter the for loop for the second time and reduce error to 5, when the other thread will not enter the second for loop at all, leaving the system in the strange state that a parallel for loop has been executed by one thread, but the other thread refuses to join. It is surely a warning about writing into shared variables inside parallel loops and using them as control variables elsewhere.

OTHER TIPS

Your program has unspecified behavior. See section 2.8 in the OpenMP 5.0 specification1:

Each worksharing region must be encountered by all threads in a team or by none at all

This means that any kind of branch (if, while etc.) whose condition may be different for different threads around a #pragma omp for (or any other worksharing construct) is illegal:

#pragma omp parallel
{
  if (...true for some threads, false for others...) // ILLEGAL!
  {
    #pragma omp for
    for (...) ...
  }

  while (...true for some threads, false for others...) // ILLEGAL!
  {
    #pragma omp for
    for (...) ...
  }
}

In your case, this unspecified behavior probably leads to the following sequence of events:

  • Each thread checks the condition, but it may not be the same for all threads - some enter the while loop, some don't.
  • If they enter the while loop:
    • They encounter the #pragma omp for.
    • In the for loop, they update error.
    • They wait at the implicit barrier at the end of #pragma omp for.
  • If they don't enter the while loop:
    • They wait at the implicit barrier at the end of #pragma omp parallel.

When an OpenMP thread reaches a barrier, it waits until all threads in its team have reached the barrier. The implicit barrier of #pragma omp for does not adapt to the number of threads that encountered the construct. In your case, some threads will never reach the barrier at the end of the for loop (because for them the while condition was false). They've skipped the while loop and now wait at the implicit barrier at the end of #pragma omp parallel.

The result is a deadlock: Some threads wait at the end of #pragma omp for, others at the end of #pragma omp parallel, and the two groups will never get together again...


The explicit barrier before #pragma omp for suggested in Walter's answer fixes this by separating the reads and writes of the shared variable error. More specifically:

  • Each thread checks the condition, and it's the same for all - either all or none enter the body of the while loop.
  • If they enter the while loop:
    • They all wait at the explicit barrier.
    • They all encounter the #pragma omp for.
    • In the for loop, they update error.
    • They all wait at the implicit barrier at the end of #pragma omp for. (The barrier does an implicit flush, which means that all threads see the final value of error.)
    • Go back to start.
  • After the while loop:
    • They all wait at the implicit barrier at the end of #pragma omp parallel.
    • Done.

Of course, now all threads execute the for loop, and this doesn't "minimize the parallelization overhead", which is what you want. I guess you'll have to restructure your code some more to reach that goal. Maybe using #pragma omp task instead of #pragma omp for may be a good approach, but that depends on the details of your actual data structures and algorithms.


Note: You may be able to get rid of the deadlock by adding a nowait clause to #pragma omp for, but that would be a hack, and your program would still have unspecified behavior.


1: ...or the respective section in other OpenMP versions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top