문제

I have a simple opencl code like that:

__kernel void cache(
    __global float* data,
    __local float* sharedData) 
{
    int globalId=get_global_id(0);

    int localId=get_local_id(0);

    // Cache data to local memory
    sharedData[localId]=data[globalId];
}

There are two input memory, one is global one is local.

For example, the global size I applied is {10000}, and the local size is {10}

So this "int globalId=get_global_id(0);" will get a number between 0~9999 right? "int localId=get_local_id(0);" will get a number like: "0~9"

Then how does this "sharedData[localId]=data[globalId];" Copy the data from global to local? Will we have 10000* 10's situations? like:

sharedData[0]=data[0];
sharedData[0]=data[1];
sharedData[0]=data[2];
...
sharedData[1]=data[0];
sharedData[1]=data[1];
...

So, what's going on about sharedData[localId]=data[globalId];

Thanks!

도움이 되었습니까?

해결책

It goes more like this (conceptually):

sharedData[get_group_id(0) returns 0][0]=data[0];
sharedData[get_group_id(0) returns 0][1]=data[1];
sharedData[get_group_id(0) returns 0][2]=data[2];
...
sharedData[get_group_id(0) returns 1][0]=data[10];
sharedData[get_group_id(0) returns 1][1]=data[11];
sharedData[get_group_id(0) returns 1][2]=data[12];
...

You can imagine that the local memory is 2d array implicitly indexed by the workgroup id. Naturally you cannot access the local memory of a different group from another group. But it can help you to understand the concept.

So there will just be 10000 reads split into groups of 10.

다른 팁

As kernel code is executed by all Work Items, number of operations is equal to NDSize of your kernel. So, total number of readings will be 10000.

Then, as get_local_id(0) returns only 10 different values (size of Work Group is equal to 10), number of local arrays will be 10000 / 10 = 1000.

You are reading different data from global memory by unique WI ID (get_global_id() always return unique number for every Work Item), no data duplication will take place.

Finally, your code will dice up a 10000 global array into 1000 local arrays, each of size 10,

There is a decent example here:

How do I use local memory in OpenCL?

https://github.com/lettergram/Cache-Comparison/blob/master/cl/opencl.h

Local data is only usable inside the kernel and cannot be returned. The global data is actually returned upon execution.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top