Question

I setup a cluster on ec2 using starcluster, and have the scipy stack setup (including mpi4py). Ive been told to issue an mpi job with number of processors that match those in the cluster. For example, if I create a 4 node cluster of an instance type where each has 2 vcpu, do I issue:

mpiexec -n 4 python mytestfile.py

or

mpiexec -np 8 python mytestfile.py

Also, how can I be sure that each node is in fact handling 2 instances of the problem as opposed to one node handling all 8? In other words, does mpiexec automatically figure all that out?

Was it helpful?

Solution

mpiexec needs a list of nodes on which to start the MPI processes. In general, there are two ways that it could obtain the list.

The first way is through integration with some kind of a distributed resource manager, e.g. SGE, Torque, LSF, etc. Those usually provide the list of granted nodes to each job and then the MPI launcher could parse that list and extract the necessary information. In that case it usually suffices to simply run mpiexec -n #procs ./executable.

The second way is to provide the list manually. That is usually done either through command line options (like -H in Open MPI) or via the so called hostfile. Hostfiles usually list the hostnames of the nodes where the job should execute and the number of slots on each node. An example hostfile for Open MPI would be:

node0001 slots=2
node0002 slots=2
node0003 slots=2
node0004 slots=2

slots=2 in that case tells the library that each node provides two slots, which means that up to two MPI processes could be started on that node. The number of slots is an arbitrary value and it is not required to match the number of CPUs/cores, though in most cases the number of slots would match the number of cores (though this might not be the case for hybrid MPI+threads jobs).

Given the above hostfile, one would start an MPI job like this:

mpiexec --hostfile /path/to/hostfile -n 8 ...

Starting jobs on remote nodes usually requires that the current user is able to login into those nodes using some passwordless mechanism, e.g. public key authentication.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top