Game of Life with OpenMP

Question

I think the outer loop can not be parallelized, because input to each generation is the previous generation, so it has a sequential formula (at least you can not do it with minor changes!)

In case of nested loops which traverse a matrix or something like that, I prefer to run a single loop from 0 to ncol*nrow (in your case) and find i and j from loop index.

like this:

// because you are running a parallel codes multiple times in a loop,
// it would be better to make the thread swarm first and schedule the
// tasks in each loop iteration, to avoid multiple creation and destruction
// of working threads
#pragma omp parallel
for (curgen = 0; curgen < gens_max; curgen++)
{
    #pragma omp for
    for (t = 0; t < nrows*ncols; t++)
    {
        int i = t / ncols;
        int j = t % ncols;
        const int inorth = mod (i-1, nrows);
        const int isouth = mod (i+1, nrows);
        const int jwest = mod (j-1, ncols);
        const int jeast = mod (j+1, ncols);

        const char neighbor_count = 
            BOARD (ingrid, inorth, jwest) + 
            BOARD (ingrid, inorth, j) + 
            BOARD (ingrid, inorth, jeast) + 
            BOARD (ingrid, i, jwest) +
            BOARD (ingrid, i, jeast) + 
            BOARD (ingrid, isouth, jwest) +
            BOARD (ingrid, isouth, j) + 
            BOARD (ingrid, isouth, jeast);

        BOARD(outgrid, i, j) = alivep (neighbor_count, BOARD (ingrid, i, j));
    }
    SWAP_BOARDS( outgrid, ingrid );
}

I ran this code on my laptop with Dual Core 2.53 GHz CPU on a 1000x1000 matrix over 1000 generations, and got 69% speed up.