Question

I have a jobscript compile.pbs which runs on a single CPU and compiles source code to create an executable. I then have a 2nd job script jobscript.pbs which I call using 32 CPU's to run that newly created executable with MPI. They both work perfectly when I manually call them in succession, but I would like to automate the process by having the first script call the 2nd jobscript just before it ends. Is there a way to properly nest qsub calls or have them be called in succession?

Currently my attempt is to have the first script call the 2nd script right before it ends, but when I try that I get a strange error message from the 2nd (nested) qsub:

qsub: Bad UID for job execution MSG=ruserok failed validating masterhd/masterhd from s59-16.local

I think the 2nd script is being called properly, but maybe the permissions are not the same as when I called the original one. Obviously my user name masterhd is allowed to run the jobscripts because it works fine when I call the jobscript manually. Is there a way to accomplish what I am trying to do?

Here is a more detailed example of the procedure. First I call the first jobscript and specify a variable with -v:

qsub -v outpath='/home/dest_folder/' compile.pbs

That outpath variable just specifies where to copy the new executable, and then the 2nd jobscript changes to that output directory and attempts to run jobscript.pbs.

compile.pbs:

#!/bin/bash
#PBS -N compile
#PBS -l walltime=0:05:00
#PBS -j oe
#PBS -o ocompile.txt

#Perform compiling stuff:
module load gcc-openmpi-1.2.7
rm *.o
make -f Makefile
#Copy the executable to the destination:
cp visct ${outpath}/visct
#Change to the output path before calling the next jobscript:
cd ${outpath}
qsub jobscript

jobscript.pbs:

#!/bin/bash
#PBS -N run_exe
#PBS -l nodes=32
#PBS -l walltime=96:00:00
#PBS -j oe
#PBS -o results.txt

cd $PBS_O_WORKDIR
module load gcc-openmpi-1.2.7
time mpiexec visct
Was it helpful?

Solution

You could make a submitting script that qsubs both jobs, but makes the second execute only if, and after, the first was completed without errors:

JOB1CMD="qsub -v outpath='/home/dest_folder/' compile.pbs -t"  # -t for terse output
JOB1OUT=$(eval $JOB1CMD)
JOB1ID=${JOB1OUT%%.*}  # parse to get job id, change accordingly

JOB2CMD="qsub jobscript.pbs -W depend=afterok:$JOB1ID"
eval $JOB2CMD

OTHER TIPS

It's possible that there are restrictions on your system to run scripts inside scripts. Your first job only runs for 5 minutes and then the second job needs 96 hours. If the second job is requested inside the first job, that would violate the time limit of the first job.

Why can't you just put the compile part at the beginning of the second script?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top