OPENMP Iteriation never exits
Question
I am trying to create a code of an iterative process while keeping the while inside the parallel region to minimize the parallelization overhead.
the code is something like this The problem is it never exits, so if possible i would like your thoughts about this
#include <stdio.h>
#include <omp.h>
int main(int argc, char **argv)
{
float error = 20;
#pragma omp parallel shared(error)
{
while (error > 5)
{
#pragma omp for reduction(-:error)
for (int i=0; i<10; ++i)
{
error -= 1;
}
}
}
fprintf(stderr, "Program terminated\n");
return 0;
}
Solution
this is a funny little problem. I have no overwhelming openmp experience, but after some experiments with your code I think the problem is caused by the lack of synchronisation when entering the parallel for loop (insert write statements to "watch" your code).
You can get the code to work by inserting a barrier just before the parallel for loop:
#pragma omp barrier
#pragma omp for reduction(-:error)
for(int i=0; i<10; ++i)
Without that barrier and running on 2 threads, one thread will enter the for loop for the second time and reduce error
to 5, when the other thread will not enter the second for loop at all, leaving the system in the strange state that a parallel for loop has been executed by one thread, but the other thread refuses to join. It is surely a warning about writing into shared variables inside parallel loops and using them as control variables elsewhere.
OTHER TIPS
Your program has unspecified behavior. See section 2.8 in the OpenMP 5.0 specification1:
Each worksharing region must be encountered by all threads in a team or by none at all
This means that any kind of branch (if
, while
etc.) whose condition may be different for different threads around a #pragma omp for
(or any other worksharing construct) is illegal:
#pragma omp parallel
{
if (...true for some threads, false for others...) // ILLEGAL!
{
#pragma omp for
for (...) ...
}
while (...true for some threads, false for others...) // ILLEGAL!
{
#pragma omp for
for (...) ...
}
}
In your case, this unspecified behavior probably leads to the following sequence of events:
- Each thread checks the condition, but it may not be the same for all threads - some enter the
while
loop, some don't. - If they enter the
while
loop:- They encounter the
#pragma omp for
. - In the for loop, they update
error
. - They wait at the implicit barrier at the end of
#pragma omp for
.
- They encounter the
- If they don't enter the
while
loop:- They wait at the implicit barrier at the end of
#pragma omp parallel
.
- They wait at the implicit barrier at the end of
When an OpenMP thread reaches a barrier, it waits until all threads in its team have reached the barrier. The implicit barrier of #pragma omp for
does not adapt to the number of threads that encountered the construct. In your case, some threads will never reach the barrier at the end of the for
loop (because for them the while
condition was false). They've skipped the while
loop and now wait at the implicit barrier at the end of #pragma omp parallel
.
The result is a deadlock: Some threads wait at the end of #pragma omp for
, others at the end of #pragma omp parallel
, and the two groups will never get together again...
The explicit barrier before #pragma omp for
suggested in Walter's answer fixes this by separating the reads and writes of the shared variable error
. More specifically:
- Each thread checks the condition, and it's the same for all - either all or none enter the body of the
while
loop. - If they enter the
while
loop:- They all wait at the explicit barrier.
- They all encounter the
#pragma omp for
. - In the for loop, they update
error
. - They all wait at the implicit barrier at the end of
#pragma omp for
. (The barrier does an implicitflush
, which means that all threads see the final value oferror
.) - Go back to start.
- After the
while
loop:- They all wait at the implicit barrier at the end of
#pragma omp parallel
. - Done.
- They all wait at the implicit barrier at the end of
Of course, now all threads execute the for
loop, and this doesn't "minimize the parallelization overhead", which is what you want. I guess you'll have to restructure your code some more to reach that goal. Maybe using #pragma omp task
instead of #pragma omp for
may be a good approach, but that depends on the details of your actual data structures and algorithms.
Note: You may be able to get rid of the deadlock by adding a nowait
clause to #pragma omp for
, but that would be a hack, and your program would still have unspecified behavior.
1: ...or the respective section in other OpenMP versions.