Option 1: Use non-blocking probes to check if a message is waiting and sleep a bit if not:
do {
int flag;
MPI_Iprobe(MPI_ANY_SOURCE, RES, &flag, world, &status);
if (flag) {
MPI_Recv(res, 4, MPI_DOUBLE, status.MPI_SOURCE, RES, world, &status);
...
ndone++;
}
else
usleep(10000);
gettimeofday(&end, NULL);
countTime = (end.tv_sec+(end.tv_usec)*1.e-6)-(start.tv_sec+(start.tv_usec)*1.e-6);
} while (ndone < (nprocs - 1) && countTime < WallTime);
You could skip the usleep()
call and then the master process will run a tight loop, keeping the CPU utilisation at almost 100%. This is usually not a problem on HPC systems where each MPI rank is bound to a separate CPU core.
Option 2: Most resource managers can be configured to deliver a Unix signal some time before the job is about to be killed. For example, both Sun/Oracle Grid Engine and LSF deliver SIGUSR2 some time before the job gets killed with SIGKILL. For SGE, one should add the -notify
option to qsub
to make it send SIGUSR2. The amount of time between SIGUSR2 and the following SIGKILL is configurable by the SGE admin on a per-queue basis. LSF sends SIGUSR2 when the job end time is reached and if the job does not terminate within 10 minutes after that, it sends SIGKILL.
Option 3: If your resource manager is uncooperative and not sending warning signals before killing your job, you could simply send yourself SIGALRM. You would usually do the following:
- create a timer using
timer_create()
; - (re-)arm the timer using
timer_settime()
; - destroy the timer using
timer_delete()
in the end.
You could either program the timer to expire shortly before the total wall-clock time (but that is a bad programming practice since you have to match that value to the wall-clock time requested with the resource manager) or you could have the timer fire at short intervals, e.g. 5 mins, and then rearm it every time.
Option 2 and 3 require that you write and set a signal handler for the corresponding signal(s). The nice thing about signals is that they are usually delivered asynchronously, even if your code is stuck inside a blocking MPI call like MPI_Recv
. I would consider this an advanced topic and would recommend that you stick to option 1 for now and just keep in mind that options 2 and 3 exist.
Option 4: Some MPI libraries support checkpoint/restart of the running jobs. Checkpointing creates a snapshot of your MPI job's running state and then the state can be restored with special mpiexec
(or whatever the name of the MPI launcher, if any) command-line flags. This method requires zero changes to your program's source code but is usually not widely available, especially on cluster setups.