Question

I want to execute foo.sh on 2 different nodes. Therefore, I wrote the following script:

#!/home/farago/bin/dash    
qsub -N dist -o P -e P-err -V -v 
  "EXECSCRIPT=foo.sh" 
  -l walltime=12:00:00,nodes=2:ppn=1 Cluster_ExecExp_pbsdsh.sh

with Cluster_ExecExp_pbsdsh.sh:

#!/home/farago/bin/dash
#PBS -l nodes=2:ppn=1 
#PBS -l walltime=12:00:00          
/usr/bin/pbsdsh -v dash $EXECSCRIPT

Strangely, foo.sh is always executed on two CPUs of the same node :(

So: Why does pbs(dsh) schedule my task onto one node, even though I have specified nodes=2:ppn=1? (And do I have to give these parameters in both of my scripts?)


Update: if foo.sh consists of

#!/bin/bash

echo "foostart" >> /home/farago/output.txt
cat $PBS_NODEFILE >> /home/farago/output.txt
echo "fooend" >> /home/farago/output.txt

then I get output.txt:

foostart
cn11
cn11
fooend
foostart
cn11
cn11
fooend

So it seems that giving the parameter -l nodes=2:ppn=1 twices results in both qsub and pbsdsh distributing the job twice. But I still do not understand why the jobs are not scheduled on different machines.

Was it helpful?

Solution

It is only being launched on one node because your job is only running on one node. I'm not sure why your scheduler is launching you on only cn11, but the $PBS_NODEFILE tells you what hosts your job is using.

Some schedulers combine your request onto 1 node if possible, even the value for nodes is > 1. This part isn't strange.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top