Вопрос

    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           Number[0]++;
       }
       if (id%10==0)
       {
           Number[1]++;
       }
    }

As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range.

So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. I can't get the correct result by reading the Number[0] or Number[1].

Is there any solution like the "global variable" in C++?

Thanks!

Это было полезно?

Решение

You need to use atomic operations.

__kernel void cl_test(__global int* Number)
{
   int id = get_global_id(0);
   if (id%5==0)
   {
       atomic_inc(Number);
   }
   if (id%10==0)
   {
       atomic_inc(&Number[1]);
   }
}

You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads.

Другие советы

Atomic add will solve the summing problem

 __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           atomic_add( Number, 1 );
       }
       if (id%10==0)
       {
           atomic_add( Number +1, 1 );
       }
    }
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top