What is the difference between creating a buffer object with clCreateBuffer + CL_MEM_COPY_HOST_PTR vs. clCreateBuffer + clEnqueueWriteBuffer?

https://stackoverflow.com/questions/3832963

26-09-2019
|

Question

I have seen both versions in tutorials, but I could not find out, what their advantages and disadvantages are. Which one is the proper one?

cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL, NULL);
clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputdata, 0, NULL, NULL);

vs.

cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ,sizeof(float) * DATA_SIZE, inputdata, NULL);

Thanks.

[Update]

I added CL_MEM_COPY_HOST_PTR, to the second example to make it correct.

Solution

I assume that inputdata is not NULL.

In that case the second approach should not work at all, since the specifications says, that clCreateBuffer returns NULL and an error, if:

CL_INVALID_HOST_PTR if host_ptr is NULL and CL_MEM_USE_HOST_PTR or CL_MEM_COPY_HOST_PTR are set in flags or if host_ptr is not NULL but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in flags.

so you mean either

clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(float) * DATA_SIZE, inputdata, NULL);

clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,sizeof(float) * DATA_SIZE, inputdata, NULL);

The first one should be more or less the same as the first approach you showed, while the second one won't actually copy the data, but instead use the supplied memory location for buffer storage (caching portions or all of it in device memory). Which of those two is better depends on the usage scenario obviously.

Personaly I prefer using the two step approach of first allocating the buffer and afterwards filling it with a writeToBuffer, since I find it easier to see what happens (of course one step might be faster (or it might not, thats just a guess))

OTHER TIPS

During my working with OpenCL I found a very important difference between

cl_mem CT = clCreateImage3DContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR , Volume_format, X, Y, Z, rowPitch, slicePitch, sourceData, &error);

and

cl_mem CT = clCreateImage3D(Context, CL_MEM_READ_ONLY , Volume_format, X, Y, Z, 0, 0, 0, &error);
error = clEnqueueWriteImage(CommandQue, CT, CL_TRUE, origin, region, rowPitch, slicePitch, sourceData, 0, 0, 0);

For the first approach OpenCL will copy the host pointer not direct to the GPU. First it will allocate a second temporary buffer on the host which can cause problems if you load big stuff like a CT to the GPU. For a short time the needed memory is twice the CT size. Also the data is not copied during this function. It is copied during the argument setting to the kernel function which uses the 3D image object.

The second approach direct copies the data to the GPU. There are no additional allocations done by OpenCL. I think this is probably the same for normal buffer objects.

The nice aspect of the first approach, is that "clEnqueueWriteBuffer" allows you to assign an event to the copy of a buffer. So, let's say you want to measure the time it takes to copy data to the GPU using the GPU_Profiling options, you will be able to do so with the first approach, but not with the second one.

The second approach is more compact, easier to read, and requires less lines to code.

One major difference that I've run into:

cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY,sizeof(float) * DATA_SIZE, NULL, NULL); clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputdata, 0, NULL, NULL);

This first set of commands will create an empty buffer and enqueue a command in your command queue to fill the buffer.

cl_mem input = clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ,sizeof(float) * DATA_SIZE, inputdata, NULL)

This second command will create the buffer and fill it immediately. Note that there's no command queue in this argument list, so it uses the contents of input data as it is right now.

If you've already been running CL code and your source pointer is dependent upon a previous command in the command queue completing (e.g. an enqueued read of a prior output buffer), you definitely want to use the 1st method. If you try to create and fill the buffer in a single command, you'll end up with a race condition in which the buffer contents will not properly wait on the completion of your prior buffer read.

Well the main difference between these two is that the first one allocates memory on the device and then copies data to that memory. The second one only allocates.

Or did you mean clCreateBuffer(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(float) * DATA_SIZE, inputdata, NULL);?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow