tl;dr: shared memory is not initialized to 0
I think your conjecture of shared memory initialized to 0
is questionable. Try the following code, which is a slight modification of yours. Here, I'm calling the kernel twice and altering the values of the data
array. The first time the kernel is launched, the "uninitialized" values of data
will be all 0
's. The second time the kernel is launched, the "uninitialized" values of data
will be all different from 0
's.
I think this depends on the fact that shared memory is SRAM, which exhibits data remanence.
#include <stdio.h>
#define BLOCK_SIZE 32
__global__ void scan(float *input, float *output, int len) {
__shared__ int data[BLOCK_SIZE];
if (threadIdx.x == 0 && blockIdx.x == 0)
{
for (int i = 0; i < BLOCK_SIZE; ++i)
{
printf("DATA[%d] = %d\n", i, data[i]);
data[i] = i;
}
}
}
int main(int argc, char ** argv) {
dim3 block(BLOCK_SIZE, 1, 1);
dim3 grid(10, 1, 1);
scan<<<grid,block>>>(NULL, NULL, NULL);
scan<<<grid,block>>>(NULL, NULL, NULL);
cudaDeviceSynchronize();
getchar();
return 0;
}