Question

I am trying to execute an MPI program in 2 different PCs. However, when I ran this command in pc1:

mpirun -hosts user@host -n 4 bin/Demo_01.exe 

I'm getting this error:

[proxy:0:0@pc2] HYDU_sock_connect (./utils/sock/sock.c:203): unable to connect from "pc2" to "pc1" (Connection refused)

[proxy:0:0@pc2] main (./pm/pmiserv/pmip.c:209): unable to connect to server ubuntu at port 57395 (check for firewalls!)

Although I configured SSH connections as without password and disabled firewalls on each machines, the error is still there. My operating system is Ubuntu 12.04 and mpi is MPICH2.

Is there anyone to help?

Was it helpful?

Solution 3

Fixed. After I followed these steps, the error disappeared:

  1. Create administrator user accounts in both machines with the same username and password.
  2. Define hostnames by editing the file: /etc/hosts
  3. Make a clean install of ssh in both machines.
  4. Configure ssh for connecting without a password. To do this follow these links: http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/ and http://dustymabe.com/2012/08/18/exchanging-ssh-keys-using-ssh-copy-id/
  5. Locate the executable MPI program into the same paths in both machines.

OTHER TIPS

the error is caused by the the client not connecting back to server as it doesnt know the ip of the server i.e ..main (./pm/pmiserv/pmip.c:209): unable to connect to server ubuntu at...etc

the fix is to add each of hostname and related ip in the /etc/hosts i.e

172.17.0.2  master
172.17.0.3  node1
172.17.0.4  node2

this should allow for bi-directional communation of the master and the node clients

I had the same error, but the accepted answer did not help me.

For me in the hosts file I had:

localhost:8

CPUX:2

I should of had:

CPUZ:8

CPUX:2

I.e the name of the node instead of localhost. Maybe this might help some one.

montekristo_07's answer is mostly correct but not minimal; steps #2 and #3 are not strictly necessary.

You do not need to edit all your hosts' /etc/hosts files, and, if your LAN uses DHCP and you have any local DNS service running, you should not edit all your hosts' /etc/hosts files.

Insure that:

  1. only externally-resolvable hostnames are referenced in your mpiexec command line (i.e. not "localhost"), and
  2. the /etc/hosts file on the master (the machine on which you run mpiexec) does not have a line associating the public name of the master with the loopback address (127.0.0.1)

A simple test is to use literal IP addresses in your mpiexec command line. If this fixes your problem, then it's a hostname resolution problem...somewhere.

What is essential is to remember is that what is passed on your mpiexec command line, in particular host names, are going to be sent to and resolved on remote hosts.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top