Separating kernels. In a project, create two files (I refactored the default Runtime Project template and created device.cu and host.cu)
device.cu:
__device__ unsigned int bitreverse(unsigned int number) { number = ((0xf0f0f0f0 & number) >> 4) | ((0x0f0f0f0f & number) << 4); number = ((0xcccccccc & number) >> 2) | ((0x33333333 & number) << 2); number = ((0xaaaaaaaa & number) >> 1) | ((0x55555555 & number) << 1); return number; } __global__ void bitreverse(void *data) { unsigned int *idata = (unsigned int*) data; idata[threadIdx.x] = bitreverse(idata[threadIdx.x]); }
host.cu:
extern __global__ void bitreverse(void *data); ... bitreverse<<<1, WORK_SIZE, WORK_SIZE * sizeof(int)>>>(d);
Separate Compilation
- Right-click project, go to properties.
- Build/Settings.
- Setup build for SM 2.0 or newer.
- Select "Separate compilation" radio.
device.cu:
__device__ unsigned int bitreverse(unsigned int number) { number = ((0xf0f0f0f0 & number) >> 4) | ((0x0f0f0f0f & number) << 4); number = ((0xcccccccc & number) >> 2) | ((0x33333333 & number) << 2); number = ((0xaaaaaaaa & number) >> 1) | ((0x55555555 & number) << 1); return number; }
host.cu:
extern __device__ unsigned int bitreverse(unsigned int number); __global__ void bitreverse(void *data) { unsigned int *idata = (unsigned int*) data; idata[threadIdx.x] = bitreverse(idata[threadIdx.x]); } ... bitreverse<<<1, WORK_SIZE, WORK_SIZE * sizeof(int)>>>(d);
Isolate CUDA code One common pattern is to have CUDA code isolated in .cu files that have a host function wrapping kernel invocation. This way you can link object file produced from such .cu file to host code written in .cpp or .c files. Keep in mind that exported host code function should be qualified with
extern "C"
to be usable from .c files.
extern
declarations can be put in .h file. Note that .h file with CUDA C syntax (__global__
is CUDA C-specific) cannot be included in .cpp or .c.
Adding files to projects
Usually I just copy files to project folder, right-click the project and do "Refresh". Nsight will index them and include in the build.
Excluding files from the build
If you absolutely need to, you can copy device code to headers and include the headers (convention is to have .cuh extension for such header files, though .h works just the same). You can include .cu - the problem is that Nsight considers such files a source files and tries to compile them. You may exclude .cu file from the build by checking "Exclude resource from the build" checkbox in the top of any property page in the Build subtree in build properties.
CUDA multi-file samples
Pretty much any non-trivial sample is broken up into multiple files. Just create an Nsight project from, e.g., "Particles" sample.