Small OpenMP program sometimes freezes (gcc, c, linux)

https://stackoverflow.com/questions/4491063

11-10-2019
|

Question

Just write a small omp test, and it does not work correctly all the times:

#include <omp.h>
int main() {
  int i,j=0;
#pragma omp parallel
  for(i=0;i<1000;i++)
  {
#pragma omp barrier
    j+= j^i;
  }
  return j;
}

The usage of j for writing from all threads is incorrect in this example, BUT

there must be only nondeterministic value of j
I have a freeze.

Compiled with gcc-4.3.1 -fopenmp a.c -o gcc -static

Run on 4-core x86_Core2 Linux server: $ ./gcc and got freeze (sometimes; like 1 freeze for 4-5 fast runs).

Strace:

[pid 13118] futex(0x80d3014, FUTEX_WAKE, 1) = 1
[pid 13119] <... futex resumed> )       = 0
[pid 13118] futex(0x80d3020, FUTEX_WAIT, 251, NULL <unfinished ...>
[pid 13119] futex(0x80d3014, FUTEX_WAKE, 1) = 0
[pid 13119] futex(0x80d3020, FUTEX_WAIT, 251, NULL                       
                        <freeze>

Why do I have a freeze (deadlock)?

Solution

Try making i private so each loop has it's own copy.

Now that I have more time, I will try and explain. By default variables in OpenMP are shared. There are a couple of cases where there are defaults that make variables private. Parallel regions is not one of them (so High Performance Mark's response is wrong). In your original program, you have two race conditions - one on i and one on j. The problem is with the one on i. Each thread will execute the loop some number of times, but since i is being changed by each thread, the number of times any thread executes the loop is indeterminate. Since all threads have to execute the barrrier for the barrier to be satisfied, you come up with the case where you will get a hang on the barrier which will never end, since not all threads will execute it the same number of times.

Since the OpenMP spec clearly states (OMP spec V3.0, section 2.8.3 barrier Construct) that "the sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team", your program is non-compliant and as such can have indeterminate behavior.

OTHER TIPS

You're trying to add to the same location from multiple threads. You can't do what you're trying to do in parallel. If you want to do a sum in parallel, you need to divide it into smaller pieces and collect them afterwards.

Update by a5b: right idea but wrong part of code was spotted. The i variable is changed by both threads.

@ejd, If I mark i as private, will my program be compliant?

Sorry - I just saw this question. Technically if you mark variable "i" as private your program will be OpenMP compliant. HOWEVER, there is still a race condition on "j" and while your program is compliant (because there are valid cases to have race conditions), the value of "j" is unspecified (according to the OpenMP spec).

In one of your previous answers you said that you were trying to measure the speed of the barrier implementation. There are several "benchmarks" that you might want to look at that have published results for a variety of OpenMP constructs. One was written by Mark Bull (EPCC, University of Edinburgh), another (Sphinx) comes from Lawrence Livermore National Labs (LLNL), and the third (Parkbench) comes from a Japanese Computing Partnership. They may offer you some guidance.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow