문제

    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           Number[0]++;
       }
       if (id%10==0)
       {
           Number[1]++;
       }
    }

As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range.

So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. I can't get the correct result by reading the Number[0] or Number[1].

Is there any solution like the "global variable" in C++?

Thanks!

도움이 되었습니까?

해결책

You need to use atomic operations.

__kernel void cl_test(__global int* Number)
{
   int id = get_global_id(0);
   if (id%5==0)
   {
       atomic_inc(Number);
   }
   if (id%10==0)
   {
       atomic_inc(&Number[1]);
   }
}

You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads.

다른 팁

Atomic add will solve the summing problem

 __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           atomic_add( Number, 1 );
       }
       if (id%10==0)
       {
           atomic_add( Number +1, 1 );
       }
    }
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top