CUDA - how much slower is transferring over PCI-E?

https://stackoverflow.com/questions/17729351

03-06-2022
|

Question

If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes?

What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring 200 MBs?

Solution

Hope this pic explain everything. The data is generated by bandwidthTest in CUDA samples. The hardware environment is PCI-E v2.0, Tesla M2090 and 2x Xeon E5-2609. Please note both axises are in log scale.

Given this figure, we can see that the overhead of launching a transfer request takes a constant time. Regression analysis on the data gives an estimated overhead time of 4.9us for H2D, 3.3us for D2H and 3.0us for D2D.

enter image description here

OTHER TIPS

The latency plot would be more clear in this case. Small transactions aren't more expensive than big ones. The only problem with them is that they can't saturate the bus. Therefore it's possible to transfer bigger messages at almost the same time. That is why transferring one 512 KB is 120 times faster than transferring 512 1 KB transactions. The saturation point of PCIe depends on lanes count. You could find more details about PCIe features from CUDA point of view here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow