Torque + mpirun + resources allocation

https://stackoverflow.com/questions/19714914

02-07-2022
|

Question

I'm running Torque with Open MPI on a single machine with 24 cores. Why is it possible to specify in my job,sh, for instance, nodes=1:ppn:2 and still be able to run a job specified by mpirun -np 12 WhatEverCommand? In such case the job is executed on 12 cores, even though the "nodes" says 2 cpus. Doesn't specifying the "nodes" option make any restrictions on the resources to be used by the submitted job? If it doesn't, then how to prevent users from violating the server rules by overriding the declared resources?

On the other hand - specifying the nodes=1:ppn=8 and mpirun without "-np" option, gives me only 1 cpu running the job.

Am I that bad and missing something fundamental here?

Solution

By default, OpenMPI doesn't integrate with Torque at all. You have to compile OpenMPI using the --with-tm configure option, which doesn't seem to be enabled in most distro packages. The OpenMPI project mentions Torque integration in its FAQs on building and running OpenMPI.

Similarly, Torque doesn't actually restrict access to CPUs unless cpuset support is enabled. Again, this seems absent in most distro packages. This is why your OpenMPI app, when compiled without Torque integration, can hit all the cores without restriction.

Building both packages from source is not too difficult, so it's worth researching the configure options and building the support that makes sense for you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow