Question

    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           Number[0]++;
       }
       if (id%10==0)
       {
           Number[1]++;
       }
    }

As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range.

So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. I can't get the correct result by reading the Number[0] or Number[1].

Is there any solution like the "global variable" in C++?

Thanks!

Was it helpful?

Solution

You need to use atomic operations.

__kernel void cl_test(__global int* Number)
{
   int id = get_global_id(0);
   if (id%5==0)
   {
       atomic_inc(Number);
   }
   if (id%10==0)
   {
       atomic_inc(&Number[1]);
   }
}

You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads.

OTHER TIPS

Atomic add will solve the summing problem

 __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           atomic_add( Number, 1 );
       }
       if (id%10==0)
       {
           atomic_add( Number +1, 1 );
       }
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top