You're trying to do separate compilation, which needs some special command line options. See the NVCC manual for details, but here's how to get your example to compile. I've targeted sm_20, but you can target sm_20 or later depending on what GPU you have. Separate compilation is not possible on older devices (sm_1x).
- You don't need to declare the
__device__
function asextern
in your header file, but if you have any static device variables they will need to be declared asextern
Generate relocatable code for the device by compiling as shown below (
-dc
is the device equivalent of-c
, see the manual for more information)nvcc -arch=sm_20 -dc norm.cu -o norm.o -I. nvcc -arch=sm_20 -dc test.cu -o test.o -I.
Link the device parts of the code by calling nvlink before the final host link
nvlink -arch=sm_20 norm.o test.o -o final.o