Question

I do computations on the Amazon EC3 platform, using multiple machines which are connected through OpenMPI. To reduce the cost of the computation, spot instances are used, which are automatically shut down when the cost of a machine goes above a maximum preset price: : http://aws.amazon.com/ec2/spot-instances/ . A weird behaviour occurs: when a machine is shut down, the other processes in the MPI communicator still continue to run. I think that the network interfaces are silenced before the process has the time to indicate to the other processes that it has received a kill signal.

I have read in multiple posts that MPI does not provide a lot of high-level resources regarding fault-tolerance. On the other hand, the structure of my program is very simple: a master process is queried by slave processes, for the permission to execute a portion of code. The master process only keeps track of the number of queries it has replied to, and tell the slave to stop when an upper limit is reached. There is no coupling between the slaves.

I would like to be able to detect when a process silently died as mentioned previously. In that case I would re-attribute the work he was doing to a slave that is still alive. Is there a simple way to check whether a died ? I have thought of using threads and sockets to do that independently of the rest of the MPI layer, but that seem cumbersome. I also though of maintaining on the master process (which is launched on a non spot instance) a list of the time of last communication with each process, and specify a timeout, but that would not guarantee me that a slave process is dead. There is also the problem that "barrier" and "finalize functions will not see all the processes, and potentially hang.

My question would then be what kind of solution would you implement to detect if processes are silently dead ? And how would you modify the remainder of the code to be compatible with a reduced number of processes ?

Was it helpful?

Solution

Which version of Open MPI are you using?

I'm not sure exactly what Open MPI might be doing (or not doing) that wouldn't detect that a process is gone. The usual behavior of Open MPI after a failure is that the runtime would abort the entire job.

Unfortunately, there is no mechanism in Open MPI for discovering failed processes (especially in the case where it sounds like Open MPI doesn't even know they're failed). However, there is a lot of work ongoing to add this to future versions of all MPI libraries. One of the example implementations that supports this behavior is a branch of Open MPI called ULFM (www.fault-tolerance.org). There's lots of documentation there to see exactly what's going on, but essentially, it's a new chapter in the MPI standard to add fault tolerance.

There is an older effort that's available in MPICH 3.0.3 (unfortunately, it's broken in 3.0.4, but it should be back for 3.1) (www.mpich.org). The documentation for using that work is in the README.

The problem with both of these efforts is that they aren't compliant with the MPI Standard. Eventually, there will be a chapter describing fault tolerance in MPI and all of the MPI implementations will become compatible, but in the meantime, there is no good solution for everyone.

OTHER TIPS

PVM might be a reasonable alternative to MPI in your case. While no longer developed after it lost to MPI years ago, PVM still comes pre-packaged with most Linux distributions and provides built-in fault tolerance. It's API is conceptually very similar to that of MPI, but its execution model differs a bit. One could say that it allows for one degree less coupling between the tasks in the parallel program than MPI does.

There is an example implementation of a fault-tolerant master-worker PVM application in Beowulf Cluster Computing with Linux. Read the relevant chapter from the book here.

As for fault tolerance in MPI, the proposed addition to the standard was rejected when the MPI Forum voted for inclusion of new features in MPI-3.0. It might take much longer than anticipated before FT becomes a standard feature of MPI.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top