Pregunta

I am learning about CUDA optimizations. I found a presentation on this link: Optimizing CUDA by Paulius Micikevicius.

In this presentation, they talk about

MAXIMIZE GLOBAL MEMORY BANDWIDTH

, they say global memory coalescing will improve the bandwidth.

My question, How do you calculate the Global Memory Bandwidth. Can anyone explain me with a simple program example.

¿Fue útil?

Solución

Theoretical bandwidth can be calculated using hardware spec.

For example, the NVIDIA GeForce GTX 280 uses DDR RAM with a memory clock rate of 1,107 MHz and a 512-bit wide memory interface. Using these data items, the peak theoretical memory bandwidth of the NVIDIA GeForce GTX 280 is 141.6 GB/sec:

enter image description here

In this calculation, the memory clock rate is converted in to Hz, multiplied by the interface width (divided by 8, to convert bits to bytes) and multiplied by 2 due to the double data rate. Finally, this product is divided by 109 to convert the result to GB/sec (GBps).

Effective bandwidth is calculated by timing specific program activities and by knowing how data is accessed by the program. To do so, use this equation:

Effective bandwidth = (( Br + Bw ) / 109 ) / time

Here, the effective bandwidth is in units of GBps, Br is the number of bytes read per kernel, Bw is the number of bytes written per kernel, and time is given in seconds.

More information is available in CUDA best practice guide.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top