CUDA shared memory addressing

https://stackoverflow.com/questions/11278712

18-06-2021
|

Question

I understand that when I declare a shared memory array in a kernel, the same sized array is declared by all the threads. A code like

__shared__ int s[5];

will create a 20 byte array in each thread. The way I understand addressing shared memory is that it is universal across all the threads. So, if I address subscript 10 as follows

s[10] = 1900;

it is the exact same memory location across all the threads. It won't be the case that different threads access different shared memory address for subscript 10. Is this correct? The compiler of course throws warnings that the subscript is out of range.

Solution

Actually it will create a 20-byte array per block, not per thread.

Every thread within the block will be able to access these 20 bytes. So if you need to have N bytes per thread, and a block with M threads, you'll need to create a N*M buffer per block.

In your case, if there was 128 threads, you would have had

__shared__ int array[5*128];

And array[10] would have been a valid address for any thread within the block.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow