OpenCL Alternative Modulo Uses, Advice

https://stackoverflow.com/questions/4842507

27-10-2019
|

Question

There is this simple function which I have used with C++ in the past to simulate simple forms of tessellation. The function takes a number and a divisor. The divisor must be (a power of two - 1) and n should be between 0 and divisor. It returns a modulus result of n % (d+1) using bitwise &.

Fairly sure the function goes like:

unsigned int BitwiseMod(unsigned int n, unsigned int d){ return n & d; }

I am wanting to use this effectively in OpenCL and am wondering if it will work as I imagine it too. In my mind, modulus is a very expensive operation on the GPU but I am familiar using it to form magnitude spaces and other techniques to travel through data.

More often, I would be more likely to simply write this assuming functions have some overhead.

x[i] = 8*(i&d)+offset[i];  //OR in other contexts,...

num = i&d+offset[i];
x[num] = data;

The question is: Will this be useful or get in the way, if useful can you give me some examples where I might try to apply it.

Solution

On NVidia's architectures, GT200 and up, Modulo isn't particularly slow, not slower than a normal integer divide. See this paper for details.

However, using a bitwise AND is still quite a lot faster. As function calls are expensive on GPUs, OpenCL compilers aggressively use inlining to improve performance by default. You should be fine with a function call, as it will be inlined.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow