OpenMP/C++: number of elements in for-loop

https://stackoverflow.com/questions/23153716

05-07-2023
|

Frage

I am doing some very simple tests with OpenMP in C++ and I encounter a problem that is probably silly, but I can't find out what's wrong. In the following MWE:

#include <iostream>
#include <ctime>
#include <vector>
#include <omp.h>

int main()
{

  int nthreads=1, threadid=0;
  clock_t tstart, tend;
  const int nx=10, ny=10, nz=10;
  int i, j, k;
  std::vector<std::vector<std::vector<long long int> > > arr_par;

  arr_par.resize(nx);
  for (i=0; i<nx; i++) {
    arr_par[i].resize(ny);
    for (j = 0; j<ny; j++) {
      arr_par[i][j].resize(nz);
    }
  }

  tstart = clock();
#pragma omp parallel default(shared) private(threadid)
  {
#ifdef _OPENMP
    nthreads = omp_get_num_threads();
    threadid = omp_get_thread_num();
#endif
#pragma omp master
    std::cout<<"OpenMP execution with "<<nthreads<<" threads"<<std::endl;
#pragma omp end master
#pragma omp barrier
#pragma omp critical
    {
      std::cout<<"Thread id: "<<threadid<<std::endl;
    }

#pragma omp for
    for (i=0; i<nx; i++) {
      for (j=0; j<ny; j++) {
        for (k=0; k<nz; k++) {
          arr_par[i][j][k] = i*j + k;
        }
      }
    }
  }
  tend = clock();
  std::cout<<"Elapsed time: "<<(tend - tstart)/double(CLOCKS_PER_SEC)<<" s"<<std::endl;

  return 0;
}

if nx, ny and nz are equal to 10, the code is running smoothly. If I increase these numbers to 20, I get a segfault. It runs without problem sequentially or with OMP_NUM_THREADS=1, whatever the number of elements.

I compiled the damn thing with

g++ -std=c++0x -fopenmp -gstabs+ -O0 test.cpp -o test

using GCC 4.6.3.

Any thought would be appreciated!

Lösung

You have a data race in your loop counters:

#pragma omp for
for (i=0; i<nx; i++) {
  for (j=0; j<ny; j++) {          // <--- data race
    for (k=0; k<nz; k++) {        // <--- data race
      arr_par[i][j][k] = i*j + k;
    }
  }
}

Since neither j nor k are given the private data-sharing class, their values might exceed the corresponding limits when several threads try to increase them at once, resulting in out-of-bound access to arr_par. The chance to have several threads increase j or k at the same time increases with the number of iterations.

The best way to treat those cases is to simply declare the loop variables inside the loop operator itself:

#pragma omp for
for (int i=0; i<nx; i++) {
  for (int j=0; j<ny; j++) {
    for (int k=0; k<nz; k++) {
      arr_par[i][j][k] = i*j + k;
    }
  }
}

The other way is to add the private(j,k) clause to the head of the parallel region:

#pragma omp parallel default(shared) private(threadid) private(j,k)

It is not strictly necessary to make i private in your case since the loop variable of parallel loops are implicitly made private. Still, if i is used somewhere else in the code, it might make sense to make it private to prevent other data races.

Also, don't use clock() to measure the time for parallel applications since on most Unix OSes it returns the total CPU time for all threads. Use omp_get_wtime() instead.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow