문제

I'm trying to use python to control cuda, through ctypes. Here, to illustrate my problem, I use python to pass pointers along to c functions which allocate cuda memory, copy numpy array to cuda mempory, and copy cuda memory back to a new numpy array. But it doesnt seem to work, despite my basic ctypes setup working. I think the issue is with what's being returned from cudaMalloc function to python.

here's the python code

  pycu_alloc = dll.alloc_gpu_mem
  pycu_alloc.argtypes = [c_size_t]
  pycu_alloc.restypes = [c_void_p]   

  host2gpu = dll.host2gpu
  host2gpu.argtypes = [c_void_p, c_void_p, c_size_t]

  gpu2host = dll.gpu2host
  gpu2host.argtypes = [c_void_p, c_void_p, c_size_t]

  a = np.random.randn(1024).astype('float32')
  c = np.zeros(1024).astype('float32')

  c_a = c_void_p(a.ctypes.data)
  c_c = c_void_p(c.ctypes.data)

  da = pycu_alloc(1024)
  c_da = c_void_p(da)

  host2gpu(c_a, c_da, 1024)
  gpu2host(c_c, c_da, 1024)

  print a
  print c

and the C:

extern "C" {
float *  alloc_gpu_mem( size_t N)
{
  float *d;
  int size = N *sizeof(float);
  int err;

  err = cudaMalloc(&d, size);

  printf("cuda malloc: %d\n", err);
  return d;
 }}

 extern "C" {
 void host2gpu(float * a, void * da, size_t N)
 {
  int size = N * sizeof(float);
  int err;
  err = cudaMemcpy(da, a, size, cudaMemcpyHostToDevice);
  printf("load mem: %d\n", err);
  }}

  extern "C"{
 void gpu2host(float *c, void *d_c, size_t N)
 {
  int  err;
  int size = N*sizeof(float);
  err = cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
  printf("cpy mem back %d\n", err);
 }}

The code should copy a random vector a to cuda memory, and then copy that cuda memory back to an empty vector c. When I print c, thought, it is just 0s.

I've wrestled with different possibilities of the float* and void*, particularly in the way alloc_gpu_mem works. But I don't know what to do.

As for the err return values, the cudaMalloc returns 0 but both cudaMemcpy return 11.

What's python doing wrong with the pointer? Help?

도움이 되었습니까?

해결책

The problem is here:

pycu_alloc.restypes = [c_void_p]   

This doesn't do anything. What you wanted was:

pycu_alloc.restype = c_void_p

See Return types in the ctypes docs.

And without that, ctypes assumes that your function returns a C int. On a 32-bit platform, you might get away with it, because you end up constructing a c_void_p whose value is that int… but on a 64-bit platform, that pointer is going to end up with the upper 32 bits missing.

So, when you pass that into CUDA, it recognizes that the pointer isn't in any range it knows about, and gives you back a cudaErrorInvalidValue (11).

Also, if you get everything right, this line should be unnecessary:

c_da = c_void_p(da)

You're calling a function whose argtypes specifies c_void_p, so you can pass it an int that you got from a c_void_p-returning function just fine.


You can see the same behavior with plain old malloc and free, except that you'll probably get a segfault at free instead of a nice error:

malloc = libc.malloc
malloc.argtypes = [c_size_t]
malloc.restype = c_void_p # comment this line to crash on most 64-bit platforms

free = libc.free
free.argtypes = [c_void_p]
free.restype = None

a = malloc(1024)
free(a) # commenting this line and uncommenting the next two has no effect
#c_a = c_void_p(a)
#free(ca)
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top