Question

when I run a UPC code over a network of 2 nodes, with -v option enabled to give me detailed information of execution, I notice that the master node (glitch.rutgers.edu) tries to connect to itself as opposed to connect to it's neighbouring nodes.

/usr/bin/rsh glitch.rutgers.edu -l sharatds -n '/usr/bin/env'
'GASNET_MAX_SEGSIZE='74344KB'' 'GASNET_VERBOSEENV='1'' '/cac/u01/sharatds/UPC_Tests/./upcMatrxMultplction_mpi' glitch.rutgers.edu 41449 \-p4amslave \-p4yourname glitch.rutgers.edu \-p4rmrank 1
    glitch.rutgers.edu: Connection refused
    p0_5078:  p4_error: Child process exited while making connection to remote process on glitch.rutgers.edu: 0
    p0_5078: (45.046875) net_send: could not write to fd=4, errno = 32
    gasnetrun: unlinking gasnetrun_mpi-temp-4813/rsh gasnetrun_mpi-temp-4813/ssh gasnetrun_mpi-temp-4813/mpirun-rsh gasnetrun_mpi-temp-4813/mpirun-tmp

Why is this happening ? Any changes to the configuration should set this right ?

Thanks for your help

Was it helpful?

Solution

this error is likely from rsh...you can confirm this by trying to run an rsh command from the master node back to itself like "rsh glitch pwd" (my guess is this will prompt you for a password).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top