문제

I've been playing with OpenCL recently, and I'm able to write simple kernels that use only global memory. Now I'd like to start using local memory, but I can't seem to figure out how to use get_local_size() and get_local_id() to compute one "chunk" of output at a time.

For example, let's say I wanted to convert Apple's OpenCL Hello World example kernel to something the uses local memory. How would you do it? Here's the original kernel source:

__kernel square(
    __global float *input,
    __global float *output,
    const unsigned int count)
{
    int i = get_global_id(0);
    if (i < count)
        output[i] = input[i] * input[i];
}

If this example can't easily be converted into something that shows how to make use of local memory, any other simple example will do.

도움이 되었습니까?

해결책

당신이 alladeady 그것을 해결했다고 생각했습니다. 개선의 여지가 있습니다.이미지에 글꼴이 있으며 그리기를 원하는 각 문자에 대해 해당 문자로 이미지의 일부가 텍스처로로드되고 그 이후에 청소 해야하는 모든 것으로 인해

전체 이미지를 하나의 큰 텍스처에로드하려면 프로그램 기간 동안 텍스처를 유지하고 각 프레임을 렌더링 할 때 재사용하십시오.올바른 텍스처 좌표를 지정하여 렌더링 할 올바른 문자를 선택할 수 있습니다.

MacBook이 실제로 오래된 CPU 사용량이없는 60FPS 캡을 누릴 수 있어야합니다.

다른 팁

There is another possibility to do this, if the size of the local memory is constant. Without using a pointer in the kernels parameter list, the local buffer can be declared within the kernel just by declaring it __local:

__local float localBuffer[1024];

This removes code due to less clSetKernelArg calls.

In OpenCL local memory is meant to share data across all work items in a workgroup. And it usually requires to do a barrier call before the local memory data can be used (for example, one work item wants to read a local memory data that is written by the other work items). Barrier is costly in hardware. Keep in mind, local memory should be used for repeated data read/write. Bank conflict should be avoided as much as possible.

If you are not careful with local memory, you may end up with worse performance some time than using global memory.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top