Question

I have to run a code at a particular node in a cluster with 20 nodes.The cluster runs on Cent OS. I am making an ssh connection from ubuntu 12.04. I have to run a python script called training.py. There are multiple versions of python installed (2.4,2.7,3.2). Being a linux newbie I have the following doubts:

For qrsh commands:

  1. How to select a particular node to run my code ?
  2. How to select python version 2.7 if the system runs 2.4 as default.

For qsub command

  1. How to submit a job using a script. I am new to scripting. Please suggest a tutorial. But for now, a simple script that puts training.py to a queue will be very helpful.
  2. In the script I have to mention the version of python to run.
  3. I want to design an experiment that calls parameters.py and training.py sequentially multiple times when different values are being passed to the training.py. How can I do that ?

Thanks in advance ?

Was it helpful?

Solution

There are various distributions of qsub, each with their own syntax (pbs_pro, torque, openpbs) .

If you are using the torque variant, check out chapter 2 of the documentation. http://docs.adaptivecomputing.com/torque/help.htm

Basically you submit a job like so

qsub -l nodes=1:ppn=2 -l walltime=300 -l node=foo training.py

You can alternatively add these flags as part of the job submit script.

cat training.py
#!/usr/bin/python
#PBS nodes=foo
#PBS walltime=300

To specify which version of python to use, you will either need to install python yourself (assuming you have root), or request that your sys admin install python 2.7 for you on the nodes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top