cuda 'memory bound' vs 'latency bound' vs 'bandwidth bound' vs 'compute bound'

https://stackoverflow.com/questions/23278304

09-07-2023
|

Question

In the many resources online it is possible to find different usages of 'memory','bandwidth' 'latency' bound kernels. It seems to me that the authors sometimes use their own definition of these terms and I think if would be very beneficial for someone to make a clear distinction.

To my understanding: Bandwidth bound kernels approach the physical limits of the device in terms of access to global memory. E.g. an application uses 170GB/s out of 177GB/s on an M2090 device.

A latency bound kernel is one whose predominant stall reason is due to memory fetches. So we are not saturating the global memory bus, but still have to wait to get the data into the kernel.

A compute bound kernel is one in which computation dominates the kernel time, under the assumption that there is no problem feeding the kernel with memory, and there is good overlap of arithmetic and latency.

If I got these correct, what would a 'memory bound' kernel be? Is there ambiguity, and if yes, should we limit the conversation to the three above terms?

Thanks!

Solution

what would a 'memory bound' kernel be?

Memory bound refers to a general case where a code is limited by memory access, ie. it includes codes that are latency bound and codes that are bandwidth bound. You've defined pretty much all the other terms correctly.

Is there ambiguity, and if yes, should we limit the conversation to the three above terms?

I don't think there's much ambiguity (you've clearly demarcated 3 of the 4 terms, anyway), and you're not going to impose order on the world in a SO question/answer.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow