Question

I have an application that's written in Fortran and there is one particular subroutine call that takes a long time to execute. I was wondering if it's possible to distribute the tasks for computation over multiple nodes. The current serial flow of the code is as follows:

D = Some computations that give me D and it is in memory
subroutine call

<within the subroutine>

iteration from 1 .. n
{

  independent operations on D
}

I wish to distribute the iterations over n/4 machines. Can someone please guide me with this? Do let me know if something's not very clear!

Was it helpful?

Solution 2

When one has existing code and wants to parallelize incrementally (or just one routine), shared memory approaches are the "quick hit". Especially when it is known that the iterations are independant I'd first recommend looking at compiler flags for auto-parallelization, language constructs such as DO CONCURRENT (thanks to @IanH for reminding me of that), and OpenMP compiler directives.

As my extended comment is about distributed memory, however, I'll come to that.

I'll assume you don't have access to some advanced process-spawning setup on all of your potential machines. That is, you'll have processes running on various machines each being charged for the time regardless of what work is being done. Then, the work-flow looks like

  • Serial outer loop
    • Calculate D
    • Distribute D to the parallel environment
      • Inner parallel loop on subsets of D
    • Gather D on the master

If the processors/processes in the parallel environment are doing nothing else - or you're being charged regardless - then this is the same to you as

  • Outer loop
    • All processes calculate D
    • Each process works on its subset of D
    • Synchronize D

The communication side, MPI or coarrays (which I'd recommend in this case, again see @IanH's answer, where image synchronization etc., is as limited as a few loops with [..]) here is just in the synchronization.

As an endnote: multi-machine coarray support is very limited. ifort as I understand requires an licence beyond the basic, g95 has some support, the Cray compiler may well. That's a separate question, however. MPI would be well supported.

OTHER TIPS

Depending on the underlying implementation, coarrays (F2008) may allow processing to be distributed over multiple nodes. Partitioning the iteration space across the images is relatively straightforward, communication of the results back to one image (or to all images) is where some complexity might arise. Some introductory material on coarrays can be found here.

Again, depending on the underlying implementation, DO CONCURRENT (F2008) may allow parallel processing of iterations (though unlikely to be across nodes). Restrictions exist on what can be done in the scope of a DO CONCURRENT construct that mean that iterations can be executed in any order, appropriately capable compilers may be able to then transform that further into concurrent execution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top