So i have an application that i like to implement using OpenCL which is distributed across multiple machines using MPI.

Now at every iteration of the algorithm i need to synchronize the buffers between the MPI processes, but here is the catch: only the borders of the 2D buffers need to be synchronized/copied, not the entire region.

So my question is if it is possible with OpenCL's memory mapping mechanism (clEnqueueMapBuffer & clEnqueueUnmapMemObject) to read/write only the borders of a 2D buffer without triggering a complete copy of the entire buffer.

Basically this can only work if OpenCL is using DMA instead of a host side buffer copy. So my question really is if OpenCL supports DMA access of device buffer data on a discrete PCIe GPU. And if yes, on what hardware and which operating system?

有帮助吗?

解决方案

In order to be able to use DMA, the buffers should be in page-locked memory. AMD and NVIDIA state in their programming guide that to have a buffer in page-locked memory it should be created with the CL_MEM_ALLOC_HOST_PTR flag. Here is what NVIDIA says in the section 3.3.1 of its guide:

OpenCL applications do not have direct control over whether memory objects are allocated in page-locked memory or not, but they can create objects using the CL_MEM_ALLOC_HOST_PTR flag and such objects are likely to be allocated in page-locked memory by the driver for best performance.

Note the "likely" in bold.

Which OS? NVIDIA doesn't speak about the OS so any OS NVIDIA provides drivers for (the same for AMD).
Which Hardware? Any having DMA controller I guess.

Now to write only a part of a buffer you could have a look to the function:

clEnqueueWriteBufferRect()

This function allow to write to a 2 or 3D region of a buffer. Another possibility would be to use sub buffers creating them with the function:

clCreateSubBuffer()

However there is no notion of 2D buffer with it.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top