Question

I have a job that runs in the SMP parallel environment using the Sun Grid Engine. This code is well-used, and normally works fine. The code is more memory intensive than processor intensive, so I usually reserve an entire node on our cluster using -pe smp 12 (we have 12 cores/node), even if the job itself (specified in the submitted script) only uses some fraction of that.

Because I requested the SMP parallel environment, all 12 slots should come from the same node, and there should be 1 slot/core, right? Therefore, this should reserve an entire node. It has worked fine for that purpose until recently. Another user submitted a job that somehow obtained slots on the same node, also using the grid engine recently. I'm not sure how this happened. Will the grid engine start my SMP job with fewer than the requested slots? If not, is there a better way to ensure that my job reserves an entire node?

Was it helpful?

Solution

I figured it out. The second job was mistakenly being submitted to the default "all" queue, which contains all cores.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top