Question

We want to extend our batch system to support GPU computations.

The problem is that from the batch system viewpoint, the GPU is a resource. We can easily count used resources, but we also need to limit the access to them.

For GPUs that means that each job claims a GPU exclusively (when a GPU is requested).

From what I have been told, sharing GPUs between jobs is a very bad idea (because the GPU part of jobs might be killed randomly).

So, what I need is some way to limit access to GPUs for CUDA and OpenCL. The batch system has root privileges. I can limit access to devices in /dev/ using cgroups but I figured, that this won't be enough in this case.

Ideal state would be if the job would only see as many GPUs as it requested and these couldn't be accessed by any other job.

Was it helpful?

Solution

There are two relevant mechanisms at the moment:

  • Use nvidia-smi to set the devices into exclusive mode, that way once a process has a GPU no other process can attach to the same GPU.
  • Use the CUDA_VISIBLE_DEVICES variable to limit which GPUs a process sees when it looks for a GPU.

The latter is of course subject to abuse but it's a start for now.

From what I have been told, sharing GPUs between jobs is a very bad idea (because the GPU part of jobs might be killed randomly).

Not really, the main reason that sharing the GPU is a bad idea is that they will have to compete for the available memory and the processes may all fail, even though in reality one of them could have proceeded. In addition, they compete for access to the DMA and compute engines which can result in poor overall performance.

OTHER TIPS

I believe there are two things that can help with NVIDIA CUDA GPUs:

  1. Put the GPUs in "Compute Exclusive" mode via the nvidia-smi tool
  2. Instruct users to use the no-args "cudaSetDevice()" call which will automatically pick an unused GPU
  3. Instruct users to use some other means of initializing the device other than cudaSetDevice, as described in section 8.3 of the "Best Practices Guide"

I'm not sure how to achieve this for OpenCL.

I developped a library that will sort the available OpenCL platforms and devices. It will pick up the best device on a platform. It then tries to create a context on it. If this fails, it goes to the next in the list. The list is sorted by the number of compute units.

It supports nvidia (gpu), amd (gpu & cpu), intel (cpu) and apple (gpu & cpu).

There is locking mechanism for exclusive access. It is not the best though. I'm still looking for a better solution. Basically it saves a file with the platform+device used in /tmp.

This is what we use in our lab. It's available under the GPLv3 and can be found on github: https://github.com/nbigaouette/oclutils/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top