Question

I connect three servers to form an HPC cluster using condor as a middleware when I run the command condor_status from the central manager it does not shows the other nodes I can run jobs in the central manager and connect to the other nodes via SSH but it seems that there is something missing in condor configuration files where I set the central manager as condor host and allows writing and reading for everyone. I keep the daemon MASTER, STARTD in the daemon list for the worker nodes.

When I run condor_status in the central manager it just show the central manager and when I run it on the compute node it give me the error "CEDAR:6001:Failed to connect to" followed by the central manager IP and port number.

Was it helpful?

Solution

I manage to solve it. The problem was in the central manager's firewall (in my case it was iptables) which was running. So, when I stopped the firewall (su -c "service iptables stop") all nodes appeared normally, typing condor_status".

The firewall status can be checked using "service iptables status".

OTHER TIPS

There are a number of things that could be going on here. I'd suggest you follow this tutorial and see if it resolves your problems -

http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/

In my case the service "condor.exe" was not running on the server. I had stopped manually. I just start it and every thing went fine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top