CUDA Peer-to-Peer across I/O hubs

Question

No. The QPI link has a protocol which does not entirely cover all features of the PCIE protocol, and in particular some features used by the P2P protocol.

A specific difference is documented in an intel datasheet here.

“The IOH does not support non-contiguous byte enables from PCI Express for remote peer-to-peer MMIO transactions. This is an additional restriction over the PCI Express standard requirements to prevent incompatibility with Intel QuickPath Interconnect.“ (page 135)

So P2P requires a continuous PCIE fabric between the two devices. Both devices need to be on the same PCIE root complex. This particular requirement was publicized by NVIDIA in the CUDA 4.0 timeframe when GPUDirect v2.0 (Peer-to-Peer) was first introduced.

Note that in general, P2P support may vary by GPU or GPU family. The ability to run P2P on one GPU type or GPU family does not necessarily indicate it will work on another GPU type or family, even in the same system/setup. The final determinant of GPU P2P support are the tools provided that query the runtime via cudaDeviceCanAccessPeer. P2P support can vary by system and other factors as well. No statements made here are a guarantee of P2P support for any particular GPU in any particular setup.