Why does the OpenCL vector addition Nvidia SDK example use async writes?

https://stackoverflow.com/questions/3978943

09-10-2019
|

Pergunta

The vector addition example has this code:

// Asynchronous write of data to GPU device
ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcA, 0, NULL, NULL);
ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcB, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcB, 0, NULL, NULL);
shrLog("clEnqueueWriteBuffer (SrcA and SrcB)...\n"); 
if (ciErr1 != CL_SUCCESS)
{
    shrLog("Error in clEnqueueWriteBuffer, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
    Cleanup(EXIT_FAILURE);
}

// Launch kernel
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n"); 
if (ciErr1 != CL_SUCCESS)

It launches the kernel right afterwards. How does this not cause problems? We aren't guaranteeing that the graphics memory buffers have been fully written to when the kernel launches right?

Solução

While the writes are asynchronous from a host's point of view, they aren't necessarily asynchroneous from the device's point of view. I'd assume that the commandqueue is created without CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, so it's an in-order commandqueue.

The opencl specification says the following about in-order execution:

In-order Execution: Commands are launched in the order they appear in the command- queue and completed in order. In other words, a prior command on the queue completes before the following command begins. This serializes the execution order of commands in a queue.

Therefore the writes should complete before the kernel is executed on the device.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow