Ordered 'for' loop efficiency in OpenMP

https://stackoverflow.com/questions/8999904

openmp
mcmc

19-04-2021
|

Pergunta

I am trying to parallelise a single MCMC chain which is sequential in nature and hence, I need to preserve the order of iterations being executed. For this purpose, I was thinking of using an 'ordered for' loop via OpenMP. I wanted to know how does the execution of an ordered for loop in OpenMP really work, does it really provide any speed-up in terms of parallelisation of the code?

Thanks!

Solução

As long as you're having just a single Markov chain, the easiest way to parallelize it is to use the 'embarassing' parallelism: run a bunch of independent chains and collect the results when they all are done [or gather the results once in a while.]

This way you do not incur any communication overhead whatsoever.

The main caveat here is that you need to make sure different chains get different random number generator seeds.

UPD: practicalities of collecting the results.

In a nutshell, you just mix together the results generated by all the chains. For the sake of simplicity, suppose you have three independent chains:

x1, x2, x3,...
y1, y2, y3,...
z1, z2, z3,...

From these, you make a chain x1,y1,z1,x2,y2,z2,x3,y3,z3,... This is a perfectly valid MC chain and it samples the correct distribution.

Writing out all the chain history is almost always impractical. Typically, each chain saves the binning statistics, which you then mix together and analysize by a separate program. For binning analysis see, e.g. [boulder.research.yale.edu/Boulder-2010/ReadingMaterial-2010/Troyer/Article.pdf][1]

Outras dicas

If your loop contains only one block with an ordered construct, then the execution will be serial, and you will not obtain any speedup from parallel execution. In the example below there is one block that can be executed in parallel and one that will be serialized:

void example(int b, int e, float* data) 
{
    #pragma omp for schedule(static) ordered
    for (int i = b; i < e; ++i) {
        // This block can be executed in parallel
        data[i] = SomeThing(data[i]);
        if (data[i] == 0.0f) 
        {
            // This block will be serialized 
            #pragma omp ordered
            printf("Element %d resulted in zero\n", i);
        }
    }
}

The openMP ordered directive can be listed only in a dynamic perspective.
The specifications suggest that while writing for we must mention the ordered keyword. However, where in the loop would be the ordered block is your choice.

My guess is that even if we mention the ordered keyword in for, each thread will start its work in parallel. Any thread that encounters a ordered keyword must enter this block only if all the previous iterations are completed. Please focus on the keyword all previous iterations must be completed.

The intuition for the above reasoning is that an "ordered for" if executes serially does not make any sense at all.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow