Question

In visual profiler (cuda toolkit 4.1) under details tab there is a metric called throughput. This has value only for memcpy HtoD DtoH etc. Does anyone knows exactly what it is? A help search gives many different throughputs (for example global memory throughput and others) and I cannot be sure where this metric refers.

Was it helpful?

Solution

Throughput is the rate of data transfer, usually in GB/s. For HtoD and DtoH, this is the data transfer between your host and device across the PCI bus connecting the two. You could also have DtoD, in which case your throughput should be significantly higher because you're not limited by the PCI bus' bandwidth. This is an important metric because transferring data between host and device is one of the biggest bottlenecks to good performance on CUDA. You can improve the the host-device throughput by using pinned (pagelocked) memory on the host whenever possible.

You can test this by using nvvp to profile the bandwidthTest example included with the SDK.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top