Question

I'm having trouble running an OpenMPI program using only two nodes (one of the nodes is the same machine that is executing the mpiexec command and the other node is a separate machine).

I'll call the machine that is running mpiexec, master, and the other node slave.

On both master and slave, I've installed OpemMPI in my home directory under ~/mpi

I have a file called ~/machines.txt on master.

Ideally, ~/machines.txt should contain:

master
slave

However, when I run the following on master:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT, I get the following error:

bash: orted: command not found

But if ~/maschines.txt only contains the name of the node that the command is running on, it works. ~/machines.txt:

master

Command:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT:

master
master

I've tried running the same command on slave, and changed the machines.txt file to contain only slave, and it worked too. I've made sure that my .bashrc file contains the proper paths for OpenMPI.

What am I doing wrong? In short, there is only a problem when I try to execute a program on a remote machine, but I can run mpiexec perfectly fine on the machine that is executing the command. This makes me believe that it's not a path issue. Am I missing a step in connecting both machines? I have passwordless ssh login capability from master to slave.

Was it helpful?

Solution

This error message means that you either do not have Open MPI installed on the remote machine, or you do not have your PATH set properly on the remote machine for non-interactive logins (i.e., such that it can't find the installation of Open MPI on the remote machine). "orted" is one of the helper executables that Open MPI uses to launch processes on remote nodes -- so if "orted" was not found, then it didn't even get to the point of trying to launch "hostname" on the remote node.

Note that there might be a difference between interactive and non-interactive logins in your shell startup files (e.g., in your .bashrc).

Also note that it is considerably simpler to have Open MPI installed in the same path location on all nodes -- in that way, the prefix method described above will automatically add the right PATH and LD_LIBRARY_PATH when executing on the remote nodes, and you don't have to muck with your shell startup files.

Note that there are a bunch of FAQ items about these kinds of topics on the main Open MPI web site.

OTHER TIPS

Either explicitly set the absolute OpenMPI prefix with the --prefix option:

prompt> mpiexec --prefix=$HOME/mpi ...

or invoke mpiexec with the absolute path to it:

prompt> $HOME/mpi/bin/mpiexec ...

The latter option sets the prefix automatically. The prefix is then used to set PATH and LD_LIBRARY_PATH on the remote machines.

This answer comes very late but for linux users, it is a bad habit to add the environment variables at the end of the ~/.bashrc file, because carefully looking at the top, you will notice an if function exiting if in non-interactive mode, which is precisely what you do compiling your program through the ssh host. So put your environment variables at the TOP of the file, before this exiting if

try edit the file

/etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/hadoop/openmpi_install/bin" LD_LIBRARY_PATH=/home/hadoop/openmpi_install/lib

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top