Question

I am writing a function which needs to iterate until completion. I realise that I can use atomic operators, but speed is critical in this Kernel and I suspect they may not be required.

I have included a small piece of pseudo-code to demonstrate what I am intending to do

__global__ void TestKernel()
{
  __shared__ bool lbRepeat[1];
  do
  {
    lbRepeat=false;
    __syncthreads();
    if(Condition == true) lbRepeat=true;
    __syncthreads();
  }
  while(lbRepeat);
}

If no thread has found the Condition to be true lbRepeat will be false.

If one thread has found the Condition to be true lbRepeat will be true.

What will the result be if multiple threads write true into lbRepeat at the same time?

I would like to extend this to copying integer values (unsigned 16 bit specifically). As well as checking the condition I would like to copy a unsigned 16 bit integer.

__global__ void TestKernel()
{
  __shared__ unsigned short liValues[32*8];
  __shared__ bool lbRepeat[1];

  unsigned long tid = threadIdx.x+threadIdx.y*blockDim.x;
  do
  {
    lbRepeat=false;

    __syncthreads();
    if(Condition == true)
    {
       liValue[tid] = liValue[Some_Value_In_Range];
       lbRepeat=true;
    }
    __syncthreads();
  }
  while(lbRepeat);

}

If another thread is writing to the memory as it is read could this cause a neither the previous value or the new value to be returned? I do not mind if either the previous or the new value is returned (both will be valid) but a mixture of the bits of each would cause problems.

I thought this wouldn't be acceptable, but my testing seems to indicate that it works as desired. Is this because unsigned short copys are atomic in CUDA?

In Summary:

What is the result if two threads write the same value into one boolean memory location?

Can reading from a unsigned short memory location as another thread is writing a new value to the same location return a value which is neither the previous value or the new value in that memory location?

Was it helpful?

Solution

What is the result if two threads write the same value into one boolean memory location?

The end result will be that one of the written values will end up in that memory location. Which value is undefined. If all written values are the same, you can be sure that value will end up in that location.

Can reading from a unsigned short memory location as another thread is writing a new value to the same location return a value which is neither the previous value or the new value in that memory location?

Assuming these are the only two operations going on (one write, and one read), no. The read value will be either the value before the write has begun or the value after the write is complete. If you have multiple writes going on, then of course see the answer to the first question. The actual written value is undefined, except that it will be as if one of the writes succeeded and all others did not.

I'm making the above statements in the context of properly aligned 8, 16, or 32 bit datatypes, which your examples are.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top