Question

What follows is the part of my kernel that does not behave properly, then an explanation of what I've found while debugging.

__global__ void Mangler(float *matrix, int *map)
{
    __shared__ signed int localMap[N];

    if(0 == threadIdx.x) 
    {
        for(int i=0; i<N; i++) 
            localMap[i] = -1;
    }

    __syncthreads();

    int fn = ...; // a lot of code goes into this number, skipped for clarity
    int rnumber = threadIdx.x;

    int X = atomicCAS(&localMap[fn], -1, rnumber); // Spot of bother 1

    if(X == -1) // Spot of bother 2
    {
        // some code
    }
    else 
    {
        // other code
    }
}

I've found in the documentation that atomicCAS(*address, compare, value) basically returns (and saves to the given address) the result of (old == compare ? value : old), where old is the value at the address before executing the function.

Going with this, I believe that executing int X = atomicCAS(&localMap[fn], -1, rnumber); should have two possible outcomes (according to NVidia Cuda C Programming Guide):

  • if localMap[fn] == -1 then X should have a value of rnumber and localMap[fn] should have a value of rnumber. This does not happen.
  • if localMap[fn] != -1 then X should be set to the value of localMap[fn] and said value should be left intact.

What happens instead, as debugging with NSight has shown me, is that X is being assigned -1, while localMap[fn] is being assigned the value of rnumber. I do not understand that, but as you can see in my code, I've changed the if to catch this situation.

Which brings me to spot of bother number 2: though NSight shows the value of X as -1, the if {} is being completely skipped (no breakpoints within hit whatsoever) and execution jumps straight to else.

My questions:

  • Do I misunderstand atomicCAS completely? yes, I did
  • What could cause and if which should evaluate as true to jump straight into else in device code?

I'm using NVidia CUDA 5.5, Visual Studio 2012 x64 on Windows 8, NVidia Nsight Monitor Visual Studio Edition 3.1. The GPU for the machine is NVidia GeForce GTX 550 Ti.

I've tried changing the syntax to if(X!=-1); the true branch of the if is still not being executed.

Was it helpful?

Solution

From the doc, atomicCAS returns the old value, that means, that in your list, your two outcomes are wrong. Your X will always be set to the old value of localMap[fn], regardless which value it had. What is set according to the comparison with the -1, is the new value of localMap[fn]. When it is -1, it is set to rnumber, else it is left intact.

So the behaviour you see with the values of X, rnumber and localMap are as expected.

I cannot help with your second problem, as I dont use NSight, and dont know how it works - according to your code, your true branch should be evaluated (but be careful: your false branch also - as it is multi threaded some threads can have the condition evaluated to true, and some to false - my guess/assumption would be that you must tell somehow your debugger which thread/warp/block you want to debug and you looked at the false).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top