Question

I'm trying to run a NAS-UPC benchmark to study it's profile. UPC uses MPI to communicate with remote processes .

When I run the benchmark with 64 processes , i get the following error

upcrun -n 64 bt.C.64
"Timeout in making connection to remote process on <<machine name>>" 

Can anybody tell me why this error occurs ?

Was it helpful?

Solution

this probably means that you're failing to spawn the remote processes - upcrun delegates that to a per-conduit mechanism, which may involve your scheduler (if any). my guess is that you're depending on ssh-type remote access, and that's failing, probably because you don't have keys, agent or host-based trust set up. can you ssh to your remote nodes without password? sane environment on the remote nodes (paths, etc)?

"upcrun -v" may illuminate the problem, even without resorting to the man page ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top