CUDA: reduction or atomic operations?

https://stackoverflow.com/questions/5923978

algorithm
matrix
cuda
reduction
gpu-atomics

30-10-2019
|

Question

I'm writing a CUDA kernel which involves calculating the maximum value on a given matrix and I'm evaluating possibilities. The best way I could find is:

Forcing every thread to store a value in the shared memory and using a reduction algorithm after that to determine the maximum (pro: minimum divergence cons: shared memory is limited to 48Kb on 2.0 devices)

I couldn't use atomic operations because there are both a reading and a writing operation, so threads could not be synchronized by synchthreads.

Any other idea come into your mind?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow