Frage

In my kernel, if a condition is met, I update an item of the output buffer

if (condition(input[i])) //?
    output[i] = 1;

otherwise the output may stay the same, having value of 0.

The density of updates are quite unpredictable, depending on the input. Furthermore which output location will be updated is also not known. (i may force them though, in some cases)

My question is, is it better to write all items, to achieve coalescing, or do a selective write?

output[i] = condition(input[i]); //? 

Would you mind discussing your statements?

War es hilfreich?

Lösung

Coalescing is achieved even if some threads in the warp do not participate in the load or store, as long as all participating threads satisfy the requirements of coalescing. So conditional writes should have no effect on memory throughput.

However, doing a conditional write may involve additional instructions due to involving a branch (this would probably explain, for example, the difference in performance measured by Eugene in his answer).

Andere Tipps

On my setup kernel that does conditional set (option 1) runs for 1.727 us and option 2 1.399 us. This is my code (setConditional is the faster one):

__global__ void conditionalSet(unsigned int* array) {
    if ((threadIdx.x & 3) == 0) {
        array[threadIdx.x] = 1;
    }
}

__global__ void setConditional(unsigned int* array) {
    array[threadIdx.x] = (threadIdx.x & 3) == 0 ? 1 : 0;
}
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top