When to use volatile with register/local variables

Question

The volatile qualifier specifies to the compiler that all references to a variable (read or write) should result in a memory reference and those references must be in the order specified in the program. The use of the volatile qualifier is illustrated in Chapter 12 of the Shane Cook book, "CUDA Programming".

The use of volatile will avoid some optimizations the compiler can do and so change the number of used registers used. The best way to understand what volatile is actually doing is to disassemble the relevant __global__ function with and without the qualifier.

Consider indeed the following kernel functions

__global__ void volatile_test() {

   volatile float a[3];

   for (int i=0; i<3; i++) a[i] = (float)i;
}

__global__ void no_volatile_test() {

   float a[3];

   for (int i=0; i<3; i++) a[i] = (float)i;
}

Disassembling the above kernel functions one obtains

code for sm_20
      Function : _Z16no_volatile_testv
.headerflags    @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)" 
/*0000*/        MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/        EXIT ;                 /* 0x8000000000001de7 */


      Function : _Z13volatile_testv
.headerflags    @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
/*0000*/        MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */   
/*0008*/        ISUB R1, R1, 0x10;     /* 0x4800c00040105d03 */   R1 = address of a[0]
/*0010*/        MOV32I R2, 0x3f800000; /* 0x18fe000000009de2 */   R2 = 1
/*0018*/        MOV32I R0, 0x40000000; /* 0x1900000000001de2 */   R0 = 2
/*0020*/        STL [R1], RZ;          /* 0xc8000000001fdc85 */
/*0028*/        STL [R1+0x4], R2;      /* 0xc800000010109c85 */   a[0] = 0;
/*0030*/        STL [R1+0x8], R0;      /* 0xc800000020101c85 */   a[1] = R2 = 1;
/*0038*/        EXIT ;                 /* 0x8000000000001de7 */   a[2] = R0 = 2;

As you can see, when NOT using the volatile keyword, the compiler realizes that a is set but never used (indeed, the compiler returns the following warning: variable "a" was set but never used) and there is practically no disassembled code.

Opposite to that, when using the volatile keyword, all references to a are translated to memory references (write in this case).