Question

I've working with openCL lately. I create a kernel that basically take one global variable shared by all the work-items in a kernel. The kernel can't be simpler, each work-item increment the value of result, which is the global variable. The code is shown.

__kernel void accumulate(__global int* result) {
    *result = 0;
    atomic_add(result, 1);
}

Every thing goes fine when the total number of work-items are small. On my MAC pro retina, the result is correct when the work-item is around 400.

However, as I increase the global size, such as, 10000. Instead of getting 10000 when getting back the number stored in result, the value is around 900, which means more than one work-item might access the global at the same time.

I wonder what could be the possible solution for this types of problem? Thanks for the help!

Was it helpful?

Solution

*result = 0; looks like the problem. For small global sizes, every work items does this then atomically increments, leaving you with the correct count. However, when the global size becomes larger than the number that can run at the same time (which means they run in batches) then the subsequent batches reset the result back to 0. That is why you're not getting the full count. Solution: Initialize the buffer from the host side instead and you should be good. Alternatively, to do initialization on the device you can initialize it only from global_id == 0, do a barrier, then your atomic increment.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top