NVIDIA CUDA 4.0, page-locking a memory with runtime API

https://stackoverflow.com/questions/5804133

24-10-2019
|

Question

NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature of page-locking a memory range that was allocated before via the "normal" malloc function. This can be done using the driver API function:

CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);

Now, the development of the project was done so far using the runtime API. Unfortunately it seems that the runtime API does not offer a function like cuMemHostRegister. I really would like to avoid mixing driver and runtime API calls.

Does anyone know how to page-lock memory that was prior allocated using standard malloc ? Standard libc functions should not be used, since the page-locking is carried out for staging the memory for a fast transfer to the GPU, so I really want to stick to the "CUDA"-way.

Frank

Solution

The 4.0 runtime API offers cudaHostRegister(), which does exactly what you are asking about. Be aware that the memory allocation you lock must be host page aligned, so you probably should use either mmap() or posix_memalign() (or one of its relatives) to allocate the memory. Passing cudaHostRegister() an allocation of arbitrary size from standard malloc() will probably fail with an invalid argument error.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow