Question

How does one use MPI_Comm_spawn to start worker processes on remote nodes?

Using OpenMPI 1.4.3, I've tried this code:

MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
        MPI_ARGV_NULL,
        nprocs,
        info,
        0,
        MPI_COMM_SELF,
        &intercom,
        MPI_ERRCODES_IGNORE);

But that fails with this error message:

--------------------------------------------------------------------------
There are no allocated resources for the application 
  worker
that match the requested mapping:


Verify that you have mapped the allocated resources properly using the 
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------

If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.

I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?

Was it helpful?

Solution

I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?

I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.

The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.

So you'll need to do mpirun --host myhost,host2 -np 1 ./parentjob or, more generally, provide a hostfile, preferably with a number of slots available

myhost slots=1
host2 slots=8
host3 slots=8

and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top