Peer-to-Peer CUDA transfers

https://stackoverflow.com/questions/17707423

03-06-2022
|

Question

I heard about peer-to-peer memory transfers and read something about it but could not really understand how much fast this is compared to standard PCI-E bus transfers.

I have a CUDA application which uses more than one gpu and I might be interested in P2P transfers. My question is: how fast is it compared to PCI-E? Can I use it often to have two devices communicate with each other?

Solution

A CUDA "peer" refers to another GPU that is capable of accessing data from the current GPU. All GPUs with compute 2.0 and greater have this feature enabled.

Peer to peer memory copies involve using cudaMemcpy to copy memory over PCI-E as shown below.

cudaMemcpy(dst, src, bytes, cudaMemcpyDeviceToDevice);

Note that dst and src can be on different devices.

cudaDeviceEnablePeerAccess enables the user to launch a kernel that uses data from multiple devices. The memory accesses are still done over PCI-E and will have the same bottlenecks.

A good example of this would be simplep2p from the cuda samples.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow