In order to be able to use DMA, the buffers should be in page-locked memory. AMD and NVIDIA state in their programming guide that to have a buffer in page-locked memory it should be created with the CL_MEM_ALLOC_HOST_PTR flag. Here is what NVIDIA says in the section 3.3.1 of its guide:
OpenCL applications do not have direct control over whether memory objects are allocated in page-locked memory or not, but they can create objects using the CL_MEM_ALLOC_HOST_PTR flag and such objects are likely to be allocated in page-locked memory by the driver for best performance.
Note the "likely" in bold.
Which OS? NVIDIA doesn't speak about the OS so any OS NVIDIA provides drivers for (the same for AMD).
Which Hardware? Any having DMA controller I guess.
Now to write only a part of a buffer you could have a look to the function:
clEnqueueWriteBufferRect()
This function allow to write to a 2 or 3D region of a buffer. Another possibility would be to use sub buffers creating them with the function:
clCreateSubBuffer()
However there is no notion of 2D buffer with it.