CUDA is just following the rule of the standard C language. Quoting the Kernighan and Ritchie "The C Programming Language" book:
An external variable must be defined, exactly once, outside of any function; this sets aside storage for it. The variable must also be declared in each function that wants to access it; this states the type of the variable. [...] Definition refers to the place where the variable is created or assigned storage; declaration refers to places where the nature of the variable is stated but no storage is allocated.
Somewhere in your program you should have something like
extern __shared__ unsigned int sdata[];
At that location, you are creating a pointer, named sdata
, to an unsigned int
. Inside the __global__
functions you are declaring the type of sdata
, so that the __global__
function can be aware of it. In the
kernel<<<blocks,threads,numbytes_for_shared>>>(...);
launch, you are allocating space of the array pointed to by sdata
.