Bank conflicts in 2.x devices

https://stackoverflow.com/questions/11275697

18-06-2021
|

Question

What is a bank conflict in devices with 2.x devices? As I understand the CUDA C programming guide, in 2.x devices, if two threads access the same 32 bit word in the same shared memory bank, it does not cause a bank conflict. Instead, the word is broadcasted. When the two threads write the same 32 bit word in the same shared memory bank, then only one thread succeeds.

Since on-chip memory is 64 KB (48 KB for shared memory and 16 KB for L1, or vice versa), and it is organized in 32 banks, I am assuming that each bank consists of 2 KB. So I think that bank conflicts will arise if two threads access two different 32 bit words in the same shared memory bank. Is this correct?

Solution

Your description is correct. There are many access patterns that can generate bank conflicts, but here's a simple and common example: strided access.

__shared__ int smem[512];

int tid = threadIdx.x;

x = smem[tid * 2]; // 2-way bank conflicts
y = smem[tid * 4]; // 4-way bank conflicts
z = smem[tid * 8]; // 8-way bank conflicts
// etc.

Bank ID = index % 32, so if you look at the pattern of addresses in the x, y, and z accesses, you can see that in each warp of 32 threads, for x, 2 threads will access each bank, for y, 4 threads will access each bank, and for z, 8 threads will access each bank.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow