How to parallelize updating sums with OpenMP

https://stackoverflow.com/questions/13857316

07-12-2021
|

Question

The following loop iterates over all edges of a graph, determines if the end nodes belong to the same group, and then adds the edge weight to the total edge weight of that group.

// TODO: parallel
FORALL_EDGES_BEGIN((*G)) {
    node u = EDGE_SOURCE;
    node v = EDGE_DEST;
    cluster c = zeta[u];
    cluster d = zeta[v];
    if (c == d) {
        // TODO: critical section
        intraEdgeWeight[c] += G->weight(u, v);
    } // else ignore edge
} FORALL_EDGES_END();

I would like to parallelize it with OpenMP. I think that the code in the if statement is a critical section that might lead to a race condition and wrong results if threads are interrupted in the middle of it (correct?).

If the += operation could be made atomically, I believe that the problem is solved (correct?). However, I've looked up the atomic directive and there it states that:

Unfortunately, the atomicity setting can only be specified to simple expressions that usually can be compiled into a single processor opcode, such as increments, decrements, xors and the such. For example, it cannot include function calls, array indexing, overloaded operators, non-POD types, or multiple statements.

What should I use to parallelize this loop correctly?

Solution

Actually the accepted syntax for atomic update is:

x++;

x--;

++x;

--x;

x binop= expr;

x = x binop expr;

where x is a scalar l-value expression and expr is any expression, including function calls, with the only restriction being that it must be of scalar type. The compiler would take care to store the intermediate result in a temporary variable.

Sometimes it's better to consult the standards documents instead of reading tutorials on the Internet. Observe the OpenMP 3.1 standard Expample A.22.1c:

float work1(int i)
{
  return 1.0 * i;
}
...
#pragma omp atomic update
x[index[i]] += work1(i);

OTHER TIPS

I also think your if block is a critical section, you should not write a vector from multiple threads without serializing the writes. You can use #pragma omp critical to restrict the execution of the += in your if block to a single thread at a time.

You can split the expression in two parts: the function call that assigns the result into a temporary, and the addition of that temporary to the accumulator. In that case, the second expression will be simple enough to use omp atomic, assuming that the addition itself is not a complex overloaded operator for a custom type. Of course you can only do that in case G->weight(u,v) is a thread-safe call, otherwise you have to use omp critical or a mutex.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow