Is the nvidia kepler shuffle "destructive"?

Question

warp shuffle is not destructive. The operation, if repeated under the exact same conditions, will return the same result each time. The var value (myreg in your example) does not get modified by the warp shuffle function itself.

The problem you are experiencing is due to the fact that the number of participating threads on the second invocation of __shfl_down() in your first method is different than the other invocations, in either method.

First, let's remind ourselves of a key point in the documentation:

Threads may only read data from another thread which is actively participating in the __shfl() command. If the target thread is inactive, the retrieved value is undefined.

Now let's take a look at your first "broken" method:

if (my_reg > __shfl_down(my_reg,8,16)){my_reg = __shfl_down(my_reg,8,16);};

The first time you call __shfl_down() above (within the if-clause), all threads are participating. Therefore all values returned by __shfl_down() will be what you expect. However, once the if clause is complete, only threads that satisfied the if-clause will participate in the body of the if-statement. Therefore, on the second invocation of __shfl_down() within the if-statement body, only threads for which their my_reg value was greater than the my_reg value of the thread 8 lanes above them will participate. This means that some of these assignment statements probably will not return the value you expect, because the other thread may not be participating. (The participation of the thread 8 lanes above would be dependent on the result of the if comparison done by that thread, which may or may not be true.)

The second method you propose has no such issue, and works correctly according to your statements. All threads participate in each invocation of __shfl_down().