mpiexec
needs a list of nodes on which to start the MPI processes. In general, there are two ways that it could obtain the list.
The first way is through integration with some kind of a distributed resource manager, e.g. SGE, Torque, LSF, etc. Those usually provide the list of granted nodes to each job and then the MPI launcher could parse that list and extract the necessary information. In that case it usually suffices to simply run mpiexec -n #procs ./executable
.
The second way is to provide the list manually. That is usually done either through command line options (like -H
in Open MPI) or via the so called hostfile. Hostfiles usually list the hostnames of the nodes where the job should execute and the number of slots on each node. An example hostfile for Open MPI would be:
node0001 slots=2
node0002 slots=2
node0003 slots=2
node0004 slots=2
slots=2
in that case tells the library that each node provides two slots, which means that up to two MPI processes could be started on that node. The number of slots is an arbitrary value and it is not required to match the number of CPUs/cores, though in most cases the number of slots would match the number of cores (though this might not be the case for hybrid MPI+threads jobs).
Given the above hostfile, one would start an MPI job like this:
mpiexec --hostfile /path/to/hostfile -n 8 ...
Starting jobs on remote nodes usually requires that the current user is able to login into those nodes using some passwordless mechanism, e.g. public key authentication.