Question

On Ubuntu 9.10 using PVM 3.4.5-12 (the PVM package when you use apt-get) The program terminates after adding a host.

laptop> pvm
pvm> add bowtie-slave
add bowtie-slave
terminated
laptop>

Current Configuration only $PVM_RSH = bin/usr/ssh
I can ssh perfectly fine into the slave without a password, and run commands on it.

Any ideas?
Thanks in advance!

Here are the sample logs:

Laptop log

[t80040000] 02/11 10:23:32 laptop (127.0.1.1:xxxxx) LINUX 3.4.5
[t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010
[t80040000] 02/11 10:23:32 netoutput() sendto: errno=22
[t80040000] 02/11 10:23:32 em=0x2c24f0
[t80040000] 02/11 10:23:32 [49/à][6e/à][76/à][61/à][6c/à][69/à][64/à][20/à][61/à][72/à]
[t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument
[t80040000] 02/11 10:23:32 pvmbailout(0)

bowtie-log

[t80080000] 02/11 10:23:25 bowtie-slave (xxx.x.x.xxx:xxxxx) LINUX64 3.4.5
[t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010
[t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting for master
[t80080000] 02/11 10:28:26 pvmbailout(0)

Was it helpful?

Solution

I've also been struggling with this problem. I just found a couple of the things that were failing for me.

First, my master host was starting with a node-name that was not recognized by the slave host. That is, it was calling itself "foobar" but it really should have been "foobar.example.com" so that the slave knew how to talk to it. You specify this by starting the master console like this:

pvm -nfoobar.example.com

I also specified the full name of the slave. So in the console:

add baz.mumble.example.com

Then I had a problem where the console would hang when I added the slave. Hey, at least it's not just stopping! I found out that this is because of the firewall on the slave host---the communications were getting dropped (the pvmd's don't communicate over ssh after setup, they have another port that they talk over). Unfortunately, running without a firewall is not an option for that host. By default, pvmd picks a random port number, which is not what I want. Apparently there's an undocumented environment variable, PVMNETSOCKPORT, that controls what ports it uses. Right now I'm working on getting that correctly set so that I can poke the correct hole in my firewall.

Good luck! I'll try and update this answer if I get any farther.

OTHER TIPS

Ahh... the joys of starting up PVM! I use PVM via an external library, InterComm. Getting PVM to start nicely on any platform is always a fun exercise. Here are some things you can try:

If you can rsh to your compute nodes, set $PVM_RSH=/path/to/rsh. Otherwise, to configure via ssh:

Setup passwordless SSH and manually verify that it works.

Then, create $PVM_ROOT/ssh, containing something like:

#!/bin/sh

host=$1
shift
/usr/bin/ssh $host ". ~/.pvmprofile; $@"

Once that's taken care of:

Set some environment variables (this is machine-dependent):

setenv PVM_ARCH LINUX64
setenv PVM_ROOT /users/ps14/opt-intel/pvm3
setenv PVM_BIN ${PVM_ROOT}/bin

# Set the following accordingly:    
setenv PVM_RSH ${PVM_ROOT}/ssh
#setenv PVM_RSH rsh

Now, create a ".pvmprofile" file containing these variables:

rm -f ~/.pvmprofile
env | grep PVM_ > ~/.pvmprofile

Create a hostfile containing unique hostnames:

sort -k 1,1 -u ${PBS_NODEFILE} >!  pvm_hostfile

Now, start PVM & add nodes. I like to do this as a one-liner:

printf "%s\n%s\n" conf quit|${PVM_ROOT}/lib/pvm pvm_hostfile

I didn't realize I could answer my own question until now. The reason that it failed was due to the hosts file in /etc/hosts.

Ubuntu has the localhost set up to 127.0.0.1 localhost, however, using PVM, it must use a real IP address. Thus I placed the actual IP address followed by my machine name on top of the localhost so PVM will read that line first. Then all was working. I don't know why it never gave me the loopback error message though.

As rescdsk commented as well, stating which to use to start the master console would work as well but I wanted to be lazy and just type pvm for it to work.

I haven't addressed the security issues yet... maybe rescdsk or Pete will have some nice suggestions for security holes. Although, my host/clusters will not be connected to the internet. Are there any concerns?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top