I am following this tutorial which is great all the way through except the very last example on how to create a semaphore isn't working for me. The logic is fairly simple but I cannot figure out why this kernel results in an infinite loop.

myKernel.cl

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
void GetSemaphor(__global int * semaphor, __global int * data) {
   int occupied = atom_xchg(semaphor, 1);
   int realityCheck = 0;
   while(occupied == 1 && realityCheck++ < 100000)
        occupied = atom_xchg(semaphor, 1);
}

void ReleaseSemaphor(__global int * semaphor)
{
   int prevVal = atom_xchg(semaphor, 0);
}

__kernel void myKernel(__global int* data, __global int* semaphor)
{
    // semaphor[0] is set to 0 on the host.
    GetSemaphor(&semaphor[0], data);
    data[0]++;
    ReleaseSemaphor(&semaphor[0]);
}

This is with:

OpenCL 1.2

FULL_PROFILE

on a Quadro NVS 290 that has

*cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics

有帮助吗?

解决方案

That tutorial you reffer is wrong and will never work on GPU devices. Due to HW arquitecture.

Any kind of sync mechanism that blocks a workitem inside a workgroup will simply not work. Since the blocking state will affect the whole workgroup, producing an infinite loop.

You can only do these kind of thing with a workgroup size of 1. Or across workgroups.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top