Question

I currently porting an algorithm to two GPUs. The hardware has the following setup:

  • Two CPUs as a NUMA System, so the main memory is splitted to both NUMA nodes.
  • Each GPU is physically connected to one of the GPUs. (Each PCIe controller has one GPU)

I created two threads on the host to control the GPUs. The threads are bound each to a NUMA-Node, i.e. each of both threads runs on one CPU socket. How can I determine the number of the GPU such that I can select the directly connected GPU using cudaSetDevice()?

Was it helpful?

Solution

As I mentioned in the comments, this is a type of CPU GPU affinity. Here is a bash script that I hacked together. I believe it will give useful results on RHEL/CentOS 6.x OS. It probably won't work properly on many older or other linux distros. You can run the script like this:

./gpuaffinity > out.txt

You can then read out.txt in your program to determine which logical CPU cores correspond to which GPUs. For example, on a NUMA Sandy Bridge system with two 6-core processors and 4 GPUs, sample output might look like this:

0     03f
1     03f
2     fc0
3     fc0

This system has 4 GPUs, numbered from 0 to 3. Each GPU number is followed by a "core mask". The core mask corresponds to the cores which are "close" to that particular GPU, expressed as a binary mask. So for GPUs 0 and 1, the first 6 logical cores in the system (03f binary mask) are closest. For GPUs 2 and 3, the second 6 logical cores in the system (fc0 binary mask) are closest.

You can either read the file in your program, or else you can use the logic illustrated in the script to perform the same functions in your program.

You can also invoke the script like this:

./gpuaffinity -v

which will give slightly more verbose output.

Here is the bash script:

#!/bin/bash
#this script will output a listing of each GPU and it's CPU core affinity mask
file="/proc/driver/nvidia/gpus/0/information"
if [ ! -e $file ]; then
  echo "Unable to locate any GPUs!"
else
  gpu_num=0
  file="/proc/driver/nvidia/gpus/$gpu_num/information"
  if [ "-v" == "$1" ]; then echo "GPU:  CPU CORE AFFINITY MASK: PCI:"; fi
  while [ -e $file ]
  do
    line=`grep "Bus Location" $file | { read line; echo $line; }`
    pcibdf=${line:14}
    pcibd=${line:14:7}
    file2="/sys/class/pci_bus/$pcibd/cpuaffinity"
    read line2 < $file2
    if [ "-v" == "$1" ]; then
      echo " $gpu_num     $line2                  $pcibdf"
    else
      echo " $gpu_num     $line2 "
    fi
    gpu_num=`expr $gpu_num + 1`
    file="/proc/driver/nvidia/gpus/$gpu_num/information"
  done
fi

OTHER TIPS

The nvidia-smi tool can tell the topology on NUMA machine.

% nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    CPU Affinity
GPU0     X      PHB     SOC     SOC     0-5
GPU1    PHB      X      SOC     SOC     0-5
GPU2    SOC     SOC      X      PHB     6-11
GPU3    SOC     SOC     PHB      X      6-11

Legend:

  X   = Self
  SOC  = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top