Question

I lost connection to a cluster, and when i logged back in, i noticed that my calculations are still running on the node that i was working in. How can i log back into that specific node? i tried:

$qlogin -l h=node27

i get the following:

Your job 33551 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ...timeout (5 s) expired while waiting on socket fd 4

Your "qlogin" request could not be scheduled, try again later.

What can i do?

Était-ce utile?

La solution

I figured it out

   $ ssh node27 

worked for me

Autres conseils

This is likely happening because the node you've requested is in use, or the scheduler does not think you are eligible to run jobs on it.

While you may be able to ssh to the node, this is not the same as requesting resources with qlogin, and will circumvent the job scheduler, potentially overcommitting the node.

If you have confirmed with the cluster admin that you should be able to run jobs on this node with qlogin, you can wait for sufficient resources to become available on that node with:

qlogin -l h=node27 -now n

The -now n option tells qlogin not to give up if the resources you've requested aren't immediately available.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top