OpenCL overlap communication and computation

https://stackoverflow.com/questions/18302277

24-06-2022
|

سؤال

There is an example in OpenCL NVIDIA SDK, oclCopyComputeOverlap, that uses 2 queues to alternatively transfer buffers / execute kernels. In this example mapped memory is used.

**//pinned memory** 
cmPinnedSrcA = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, szBuffBytes, NULL, &ciErrNum);
**//host pointer for pinned memory**
fSourceA = (cl_float*)clEnqueueMapBuffer(cqCommandQueue[0], cmPinnedSrcA, CL_TRUE,    CL_MAP_WRITE, 0, szBuffBytes, 0, NULL, NULL, &ciErrNum);
...
**//normal device buffer**
cmDevSrcA = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, szBuffBytes, NULL, &ciErrNum);
**//write half the data from host pointer to device buffer**
ciErrNum = clEnqueueWriteBuffer(cqCommandQueue[0], cmDevSrcA, CL_FALSE, 0, szHalfBuffer, (void*)&fSourceA[0], 0, NULL, NULL);

I have 2 questions: 1) Is there any need to use pinned memory for the overlap to occur? Couldn't fSourceA be just a simple host pointer,

fSourceA = (cl_float *)malloc(szBuffBytes);
...
//write random data in fSourceA

2) cmPinnedSrcA is not used in the kernel, instead cmDevSrcA is used. Doesn't the space occupied by the buffers on the device still grow? (space required for cmPinnedSrcA added to the space required for cmDevSrcA)

Thank you

المحلول

If I understood your question properly:

1) Yes, you can use any kind of memory (pinned, host pointer, etc..) and the overlap will still occur. As far as you use two queues and the HW/drivers supports it.

But remaind that, the queues are always unsynced. And in this case, events are needed to prevent the copy queue to copy non-consistent data of the running kernel.

2) I think you are using 2 times the memory if you use pinned memory, one for the pinned and another one for a temporary copy. But I am not 100% sure, maybe it is only a pointer.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow