i need to use the modulo operation inside a kernel and it is slowing things down. It is impossible for me to remove it. Basically i have a%b where b is not a power of 2. Is there any way to avoid using it?

有帮助吗?

解决方案

Can you prefetch the answers and use a lookup table? Instead of

c = a%b;

you could then try

c = table[a][b];

Some considerations to signature and tablesize have to be made. Depending on the overall usecase you could move this table to a higher level and remove more that just this single computation.

A custom implementation of modulo would use the definition of it

(a/b)*b + a%b == a; //true
a%b == a - (a/b)*b // true

Depending on the likely values for a and b you could try to optimize this.

Depending on your target hardware you could try to see if there is a speedy hardwaresolution that can solve this for a specific product. (see this)

There may be more solutions out there.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top