A CUDA "peer" refers to another GPU that is capable of accessing data from the current GPU. All GPUs with compute 2.0 and greater have this feature enabled.
Peer to peer memory copies involve using cudaMemcpy
to copy memory over PCI-E as shown below.
cudaMemcpy(dst, src, bytes, cudaMemcpyDeviceToDevice);
Note that dst
and src
can be on different devices.
cudaDeviceEnablePeerAccess
enables the user to launch a kernel that uses data from multiple devices. The memory accesses are still done over PCI-E and will have the same bottlenecks.
A good example of this would be simplep2p from the cuda samples.