You could start by setting the "outfile" option to an empty string when creating the cluster object:
makePSOCKcluster("192.168.1.1",user="username",outfile="")
This allows you to see error messages from the workers in your terminal, which will hopefully provide a clue to the problem. If that doesn't help, I recommend using manual mode:
makePSOCKcluster("192.168.1.1",user="username",outfile="",manual=TRUE)
This bypasses ssh, and displays commands for you to execute in order to manually start each of the workers in separate terminals. This can uncover problems such as R packages that are not installed. It also allows you to debug the workers using whatever debugging tools you choose, although that takes a bit of work.
If makePSOCKcluster
doesn't respond after you execute the specified command, it means that the worker wasn't able to connect to the master process. If the worker doesn't display any error message, it may indicate a networking problem, possibly due to a firewall blocking the connection. Since makePSOCKcluster
uses a random port by default in R 3.X, you should specify an explicit value for port and configure your firewall to allow connections to that port.
To test for networking or firewall problems, you could try connecting to the master process using "netcat". Execute makePSOCKcluster
in manual mode, specifying the hostname of the desired worker host and the port on local machine that should allow incoming connections:
> library(parallel)
> makePSOCKcluster("node03", port=11234, manual=TRUE)
Manually start worker on node03 with
'/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=node01
PORT=11234 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
Now start a terminal session on "node03" and execute "nc" using the indicated values of "MASTER" and "PORT" as arguments:
node03$ nc node01 11234
The master process should immediately return with the message:
socket cluster with 1 nodes on host ‘node03’
while netcat should display no message, since it is quietly reading from the socket connection.
However, if netcat displays the message:
nc: getaddrinfo: Name or service not known
then you have a hostname resolution problem. If you can find a hostname that does work with netcat, you may be able to get makePSOCKcluster
to work by specifying that name via the "master" option: makePSOCKcluster("node03", master="node01", port=11234)
.
If netcat returns immediately, that may indicate that it wasn't able to connect to the specified port. If it returns after a minute or two, that may indicate that it wasn't able to communicate with specified host at all. In either case, check netcat's return value to verify that it was an error:
node03$ echo $?
1
Hopefully that will give you enough information about the problem that you can get help from a network administrator.