
When I try to run the following code, I get this error :

Traceback (most recent call last):
  File "C:\temp\GPU Program", line 28, in <module>
  File "C:\Python33\lib\site-packages\pycuda\", line 285, in get_function
    return self.module.get_function(name)
pycuda._driver.LogicError: cuModuleGetFunction failed: not found

Here's the code :

mod = SourceModule("""

extern "C" {
__device__ void lol(double *a)

__global__ void kernel(double *a)
    const int r = blockIdx.x*blockDim.x + threadIdx.x;
    a[r] = 1;

max_length = 5
a = numpy.zeros(max_length)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
func = mod.get_function("kernel")
newa = numpy.empty_like(a)
cuda.memcpy_dtoh(newa, a_gpu)


As you can probably see, this is a slight modification of the PyCUDA tutorial code. My intent is to call this device function which is going to launch kernels and integrate things and generally make my life easier. I did a bit of googling and I knew that I had to put "extern "c"" into my code because of name mangling and have had success with this before when I was just using PyCUDA to launch a kernel instead of a device function. Along the same lines, if I change my code to launch the kernel instead of the device function, it does what I want it to. What am I missing here?


A little bit more looking into the Device Interface Reference documentation and it seems like the function get_function only deals with global functions? Did I interpret that correctly? If so, am I able to do what I'm trying to do?

War es hilfreich?


You cannot call a __device__ function from host code. If you're indicating that the PyCUDA tutorial code shows how to do this, I'd like to see that tutorial.

It's not clear to me what you're trying to accomplish with calling the __device__ function from host code that could not be done with an ordinary kernel (__global__) launch.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top