CUDA GL interop - Reading mapped buffer texture from another process

https://stackoverflow.com/questions/23463767

15-07-2023
|

Question

I want to render something to a GL buffer texture in a render process, then read it via CUDA in another process. Currently I don't want the two processes being merged into one. Here what my code look like:

//Note: this code runs under Linux
int svMain();
int main() {
    //Tons of variable definitions vomited
    float* dptr;    //pointer to mapped device mem
    size_t map_size;    //size of mapped mem
    cudaGraphicsResource_t cuda_res;
    cudaIpcMemHandle_t memhdl;
    if( !fork() )
        return svMain();
    else {
        initGL();
        genGLBufferTextureAndUploadSomeData();
        cudaGLSetGLDevice(0);
        cudaGraphicsGLRegisterBuffer( &cuda_res, buf_id, cudaGraphicsRegisterFlagsNone );
        cudaGraphicsMapResources( 1, &cuda_res, &map_size, 0 );
        cudaGraphicsGetMappedPointer( (void**)&dptr, &map_size, cuda_res );
        cudaIpcGetMemHandle( &memhdl, dptr );
        sendToServerProcViaSocket( memhdl );
    }
    return 0;
}
int svMain() {
    cudaIpcMemHandle_t memhdl;
    float* dptr;
    cudaGLSetGLDevice(0);
    recvFromClientProc( memhdl );
    if( cudaSuccess != cudaIpcOpenMemHandle( (void**)&dptr, memhdl ) ) {
        fprintf( stderr, "SV: cannot open CUDA mem handle!\n" );
        return -1;
    } else
        launchSomeKernel( dptr );
    return 0;
}

The problem is cudaIpcOpenMemHandle always returns error. However if I allocate device memory via cudaMalloc(No GL involved) then send the corresponding memory handle, the above code works. It also works if I do all the job in one process(GL is involved, no IPC involved).

My OS is Ubuntu 13.04 LTS

The "simpleIPC" example in CUDA toolkit runs OK in my system. Here's a portion of my device query output:

Device 0: "GeForce GT 650M"
  CUDA Driver Version / Runtime Version          6.0 / 5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147287040 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

So, what's the correct way to access GL buffer texture from another process?

Solution

OpenGL contexts are tied to the process that created them. You can't share CUDA mapped OpenGL objects between processes, because the OpenGL context sort of is the "owner" of the data and when you map it into CUDA you're just lending it. OpenGL does a lot of bookkeeping and higher level management behind the scenes.

There are certain internal structures in OpenGL and constraints that must be adhered to when using OpenGL CUDA interop. For example you must not actively use an OpenGL object currently mapped to CUDA as data source or target. For example a call to glTexImage2D or glBufferData may reuse the front facing ID, but use a different buffer, associated with a different internal ID. glTexSubImage2D and glBufferSubData may need to create in-situ copies to satisfy synchronization point requirements and so on. If some process to which the OpenGL state tracker has no access does stuff it its memory, things will break.

The usual OpenGL CUDA interop sequence is

Do something in OpenGL
Unbind objects from active use in OpenGL
Map objects to CUDA
Do something in CUDA
Unmap objects from CUDA
back to 1

Now what you must do is, instead of inter processes mapping of OpenGL objects, create an area of proxy memory shared between the processes. After doing something with OpenGL you map the OpenGL object to CUDA, use cudaMemcpy to copy the data into the proxy memory and unbind the OpenGL object.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow