It is illegal to have templated functions with C linkage in C++, which is why you get the error in the first case.
In the second case, you get a not found error because you haven't actually instantiated the template anywhere I can see, so the compiler won't emit any output.
When you do add an instance, you will get the same error, because the compiled code object for the device has a mangled name. You will need to use the mangled name in the get_function
call. Paradoxically, you can't know the mangled name when JIT compiling from source, because you need to see the compiler output and that isn't know a priori (any of compiler messages, PTX, cubin or object files will give you the mangled name).
If you want to work with templated kernels in PyCUDA, I recommend compiling them to cubin yourself with the toolchain, and then loading from cubin in PyCUDA to get known mangled names from the module.