The volatile
qualifier specifies to the compiler that all references to a variable (read or write) should result in a memory reference and those references must be in the order specified in the program. The use of the volatile
qualifier is illustrated in Chapter 12 of the Shane Cook book, "CUDA Programming".
The use of volatile
will avoid some optimizations the compiler can do and so change the number of used registers used. The best way to understand what volatile
is actually doing is to disassemble the relevant __global__
function with and without the qualifier.
Consider indeed the following kernel functions
__global__ void volatile_test() {
volatile float a[3];
for (int i=0; i<3; i++) a[i] = (float)i;
}
__global__ void no_volatile_test() {
float a[3];
for (int i=0; i<3; i++) a[i] = (float)i;
}
Disassembling the above kernel functions one obtains
code for sm_20
Function : _Z16no_volatile_testv
.headerflags @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
/*0000*/ MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/ EXIT ; /* 0x8000000000001de7 */
Function : _Z13volatile_testv
.headerflags @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
/*0000*/ MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/ ISUB R1, R1, 0x10; /* 0x4800c00040105d03 */ R1 = address of a[0]
/*0010*/ MOV32I R2, 0x3f800000; /* 0x18fe000000009de2 */ R2 = 1
/*0018*/ MOV32I R0, 0x40000000; /* 0x1900000000001de2 */ R0 = 2
/*0020*/ STL [R1], RZ; /* 0xc8000000001fdc85 */
/*0028*/ STL [R1+0x4], R2; /* 0xc800000010109c85 */ a[0] = 0;
/*0030*/ STL [R1+0x8], R0; /* 0xc800000020101c85 */ a[1] = R2 = 1;
/*0038*/ EXIT ; /* 0x8000000000001de7 */ a[2] = R0 = 2;
As you can see, when NOT using the volatile
keyword, the compiler realizes that a
is set but never used (indeed, the compiler returns the following warning: variable "a" was set but never used) and there is practically no disassembled code.
Opposite to that, when using the volatile
keyword, all references to a
are translated to memory references (write in this case).