Have GTX Titan, having trouble with Dynamic Parallelism

https://stackoverflow.com/questions/19367281

30-06-2022
|

Pregunta

I'm trying to call a CUDA kernel from another kernel, but get the following error :

Traceback (most recent call last):
  File "C:\temp\GPU Program Shell.py", line 22, in <module>
    """)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 262, in __init__
    arch, code, cache_dir, include_dirs)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 252, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir)
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 134, in compile_plain
    cmdline, stdout=stdout.decode("utf-8"), stderr=stderr.decode("utf-8"))
pycuda.driver.CompileError: nvcc compilation of         c:\users\karste~1\appdata\local\temp\tmpgq8t45\kernel.cu failed
[command: nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu]
[stderr:
kernel.cu(14): error: kernel launch from __device__ or __global__ functions requires separate         compilation mode

My understanding is that this is has to do with Dynamic Parallelism and the other question related to this error is due to a user without approppriate hardware. I have a GTX Titan, however, so it should be compatible. What am I missing?

EDIT

After adding "options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu']" to SourceModule, I get the following error:

Traceback (most recent call last):
  File "C:\temp\GPU Program Shell.py", line 22, in <module>
""", options=['--cubin','-rdc=true' ,'-lcudart', '-lcudadevrt,','-Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu'])
  File "C:\Python33\lib\site-packages\pycuda\compiler.py", line 265, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: not found -

Solución

Python is compiling the CUDA code on the fly:

nvcc --cubin -arch sm_35 -m64 -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu

In order to compile code containing dynamic parallelism, it's necessary to add specific switches to the compile command to enable separate compilation, device code linking, linking of the device runtime library, and the appropriate architecture target (sm_35).

Some examples of valid nvcc command combinations are given in the programming guide section on dynamic parallelism.

Your command line should look something like:

nvcc --cubin -arch=sm_35 -m64 -rdc=true -Ic:\python33\lib\site-packages\pycuda\cuda kernel.cu -lcudadevrt

You may also wish to read the nvcc manual on separate compilation.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow