Would it be possible to access to GPU-RAM from CPU-Cores by simple pointer in a new CUDA6?

https://stackoverflow.com/questions/20074101

31-07-2022
|

Question

Now, if I use this code to try access to GPU-RAM from CPU-Cores by using CUDA5.5 in GeForce GTX460SE (CC2.1), then I get an exception "Access Violation":

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <iostream>

int main()
{
    unsigned char* gpu_ptr = NULL;
    cudaMalloc((void **)&gpu_ptr, 1024*1024);

    *gpu_ptr = 1;

    int q; std::cin >> q;
    return 0;
}

But we know, that there is UVA(Unified Virtual Addressing). And there are some new:

25 October 2013 - 331.17 Beta Linux GPU Driver: The new NVIDIA Unified Kernel Memory module is a new kernel module for a Unified Memory feature to be exposed by an upcoming release of NVIDIA's CUDA. The new module is nvidia-uvm.ko and will allow for a unified memory space between the GPU and system RAM. http://www.phoronix.com/scan.php?page=news_item&px=MTQ5NDc
Key features of CUDA 6 include: Unified Memory -- Simplifies programming by enabling applications to access CPU and GPU memory without the need to manually copy data from one to the other, and makes it easier to add support for GPU acceleration in a wide range of programming languages. http://www.techpowerup.com/194505/nvidia-dramatically-simplifies-parallel-programming-with-cuda-6.html

Would it be possible to access memory GPU-RAM from CPU-Cores by using the simple pointer in a new CUDA6?

Solution

Yes, the new unified memory feature in CUDA 6 will make it possible, on Kepler devices and beyond (so not on your Fermi GPU) to share pointers between host and device code.

In order to accomplish this, you will need to use a Kepler device (so cc 3.0 or 3.5) and the new cudaMallocManaged API. This will be further documented when CUDA 6.0 is officially available, but in the meantime you can read more about it at this blog, which includes examples.

This mechanism does not magically cause the effects of the PCI Express bus to disappear, so in effect what is happening is that two copies of the data are being made "behind the scenes" and cudaMemcpy operations are scheduled automatically by the cuda runtime, as needed. There are a variety of other implementation issues to be aware of, for now I would suggest reading the blog.

Note that Unified Memory (UM) is distinct from Unified Virtual Addressing (UVA) which has been available since CUDA 4.0 and is documented.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow