OpenMP parallel for reduction delivers wrong results

https://stackoverflow.com/questions/18013345

04-06-2022
|

Question

I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:

typedef struct matrix {
  float *data;
  int rows;
  int cols;
  int leading_dim;
} matrix;

I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row for retrieving the correct indices.

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0;
    #pragma omp parallel for reduction(+:sum)
    for(int column = 0; column < tan_hd.cols; column++) {
        sum += tan_hd.data[column * tan_hd.rows + row];
    }
    printf("row %d: %f", row, sum);
}

Without the OpenMP pragma, the delivered result is correct and looks like this:

row 0: 8172539.500000 row 1: 8194582.000000

As soon as I add the #pragma omp... as described above, a different (wrong) result is returned:

row 0: 8085544.000000 row 1: 8107186.000000

In my understanding, reduction(+:sum) creates private copies of sum for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum again. What is it, that I am doing wrong?

I appreciate your suggestions!

Solution

Use the Kahan summation algorithm

It has the same algorithmic complexity as a naive summation
It will greatly increase the accuracy of a summation, without requiring you to switch data types to double.

By rewriting your code to implement it:

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0, c = 0.0;
    #pragma omp parallel for reduction(+:sum, +:c)
    for(int column = 0; column < tan_hd.cols; column++) {
        float y = tan_hd.data[column * tan_hd.rows + row] - c;
        float t = sum + y;
        c = (t - sum) - y;
        sum = t;
    }
    sum = sum - c;
    printf("row %d: %f", row, sum);
}

You can additionally switch all float to double to achieve a higher precision, but since your array is a float array, there should only be differences in the number of signficant numbers at the end.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow