Question

I have a simple opencl code like that:

__kernel void cache(
    __global float* data,
    __local float* sharedData) 
{
    int globalId=get_global_id(0);

    int localId=get_local_id(0);

    // Cache data to local memory
    sharedData[localId]=data[globalId];
}

There are two input memory, one is global one is local.

For example, the global size I applied is {10000}, and the local size is {10}

So this "int globalId=get_global_id(0);" will get a number between 0~9999 right? "int localId=get_local_id(0);" will get a number like: "0~9"

Then how does this "sharedData[localId]=data[globalId];" Copy the data from global to local? Will we have 10000* 10's situations? like:

sharedData[0]=data[0];
sharedData[0]=data[1];
sharedData[0]=data[2];
...
sharedData[1]=data[0];
sharedData[1]=data[1];
...

So, what's going on about sharedData[localId]=data[globalId];

Thanks!

Was it helpful?

Solution

It goes more like this (conceptually):

sharedData[get_group_id(0) returns 0][0]=data[0];
sharedData[get_group_id(0) returns 0][1]=data[1];
sharedData[get_group_id(0) returns 0][2]=data[2];
...
sharedData[get_group_id(0) returns 1][0]=data[10];
sharedData[get_group_id(0) returns 1][1]=data[11];
sharedData[get_group_id(0) returns 1][2]=data[12];
...

You can imagine that the local memory is 2d array implicitly indexed by the workgroup id. Naturally you cannot access the local memory of a different group from another group. But it can help you to understand the concept.

So there will just be 10000 reads split into groups of 10.

OTHER TIPS

As kernel code is executed by all Work Items, number of operations is equal to NDSize of your kernel. So, total number of readings will be 10000.

Then, as get_local_id(0) returns only 10 different values (size of Work Group is equal to 10), number of local arrays will be 10000 / 10 = 1000.

You are reading different data from global memory by unique WI ID (get_global_id() always return unique number for every Work Item), no data duplication will take place.

Finally, your code will dice up a 10000 global array into 1000 local arrays, each of size 10,

There is a decent example here:

How do I use local memory in OpenCL?

https://github.com/lettergram/Cache-Comparison/blob/master/cl/opencl.h

Local data is only usable inside the kernel and cannot be returned. The global data is actually returned upon execution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top