Question

I have written a code that takes only 1-4 cpus. But when I submit a job on the cluster, I have to take at least one node with 16 cores per job. So I want to run several simulations on each node with each job I submit. I was wondering if there is a way to submit the simulations in parallel in one job.

Here's an example: My code takes 4 cpus. I submit a job for one node, and I want the node to run 4 instances of my code (each instance has different parameters) to take all the 16 cores.

Was it helpful?

Solution

Yes, of course; generally such systems will have instructions for how to do this, like these.

If you have (say) 4x 4-cpu jobs that you know will each take the same amount of time, and (say) you want them to run in 4 different directories (so the output files are easier to keep track of), use the shell ampersand to run them each in the background and then wait for all background tasks to finish:

(cd jobdir1; myexecutable argument1 argument2) &
(cd jobdir2; myexecutable argument1 argument2) &
(cd jobdir3; myexecutable argument1 argument2) &
(cd jobdir4; myexecutable argument1 argument2) &
wait

(where myexecutable argument1 argument2 is just a place holder for however you usually run your program; if you use mpiexec or something similar, that goes in there just as you'd normally use it. If you're using OpenMP, you can export the environment variable OMP_NUM_THREADS before the first line above.

If you have a number of tasks that won't all take the same length of time, it's easiest to assign well more than the (say) 4 jobs above and let a tool like gnu parallel launch the jobs as necessary, as described in this answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top