The problem is here:
pycu_alloc.restypes = [c_void_p]
This doesn't do anything. What you wanted was:
pycu_alloc.restype = c_void_p
See Return types in the ctypes
docs.
And without that, ctypes
assumes that your function returns a C int
. On a 32-bit platform, you might get away with it, because you end up constructing a c_void_p
whose value is that int
… but on a 64-bit platform, that pointer is going to end up with the upper 32 bits missing.
So, when you pass that into CUDA, it recognizes that the pointer isn't in any range it knows about, and gives you back a cudaErrorInvalidValue
(11).
Also, if you get everything right, this line should be unnecessary:
c_da = c_void_p(da)
You're calling a function whose argtypes
specifies c_void_p
, so you can pass it an int
that you got from a c_void_p
-returning function just fine.
You can see the same behavior with plain old malloc
and free
, except that you'll probably get a segfault at free
instead of a nice error:
malloc = libc.malloc
malloc.argtypes = [c_size_t]
malloc.restype = c_void_p # comment this line to crash on most 64-bit platforms
free = libc.free
free.argtypes = [c_void_p]
free.restype = None
a = malloc(1024)
free(a) # commenting this line and uncommenting the next two has no effect
#c_a = c_void_p(a)
#free(ca)