Question

I am trying to submit a python job with qsub which in turn submits several other jobs using subprocess and qsub.

I submit these jobs using 2 bash scripts shown below. run_test is the first one submitted and run_script is submit through subprocess.

$ cat run_test
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python test_multiple_submit.py

$ cat run_script
#$ -cwd
#$ -V
#$ -pe openmpi 1
mpirun -n 1 python $1

I am having a problem with the second script where it seems to hang at the mpirun call. I was getting an error from bash before about 'module' not found but that has vanished recently.

A simplified version of the python script is shown below

import subprocess

subprocess.Popen(cmd)

subprocess.Popen('qsub run_script '+input)

<Some checks to see if jobs are still running>

The first subprocess runs a case on the current node and the second one should outsource the job to another node, then there are some checks to see if the jobs are still running. There are also some other bits to get other jobs submitted as well but I'm pretty sure this isn't a problem with the script.

Can anyone shed any light on why the second script is failing?

Was it helpful?

Solution

I found that the compute nodes on the cluster were not submit hosts therefore I was getting an error. The only submit host was the head node.

qconf -ss

The above lists the submit hosts. To add a node to the summit list as admin is shown below:

qconf -as < host name>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top