mex linking of cuda code in separate compilation mode

Question 1

OK, I figured out the solution. Here are the complete steps for compiling mex programs with "separate compilation mode" in Nsight:

Create a cuda project.
In the project level, change build option for the following:
- Switch on -fPIC in the compiler option of "NVCC compiler" at the project level.
- Add -dlink -Xcompiler '-fPIC' to "Expert Settings" "Command Line Pattern" of the linker "NVCC Linker"
- Add letter o to "Build Artifact" -> "Artifact Extension", since by -dlink in the last step we are making the output a .o file.
- Add mex -cxx -o path_to_mex_bin/mex_bin_filename ./*.o ./src/*.o -lcudadevrt to "Post Build Steps", (add other necessary libs)
UPDATE: In my actual project I moved the last step to a .m file in MATLAB, because otherwise if I do it while my mex program is running, it could cause MATLAB crash.
For files needs to be compiled with mex, change these build option for each of them:
- Change the compiler to GCC C++ Compiler in Tool Chain Editor.
- Go back to compiler setting of GCC C++ Compiler and change Command to mex
- Change command line pattern to ${COMMAND} -c -outdir "src" ${INPUTS}

Several additional notes:

(1) Cuda specific details (such as kernel functions and calls to kernel functions) must be hidden from the mex compiler. So they should be put in the .cu files rather than the header files. Here is a trick to put templates involving cuda details into .cu files.

In the header file (e.g., f.h), you put only the declaration of the function like this:

template<typename ValueType>
void func(ValueType x);

Add a new file named f.inc, which holds the definition

template<>
void func(ValueType x) {
  // possible kernel launches which should be hidden from mex
}

In the source code file (e.g., f.cu), you put this

#define ValueType float
#include "f.inc"
#undef ValueType

#define ValueType double
#include "f.inc"
#undef ValueType

// Add other types you want.

This trick can be easily generalized for templated classes to hide details.

(2) mex specific details should also be hidden from cuda source files, since the mex.h will alter the definitions of some system functions, such as printf. So including of "mex.h" should not appear in header files that can potentially be included in the cuda source files.

(3) In the mex source code file containing the entry mexFunction, one can use the compiler macro MATLAB_MEX_FILE to selectively compile code sections. This way th source code file can be compiled into both mex executable or ordinarily executable, allowing debugging under Nsight without matlab. Here is a trick for building multiple targets under Nsight: Building multiple binaries within one Eclipse project

Question 2

First of all, it should be possible to set up Night to use a custom Makefile rather than generate it automatically. See Setting Nsight to run with existing Makefile project.

Once we have a custom Makefile, it may be possible to automate (1), (4), and (5). The advantage of a custom Makefile is that you know exactly what compilation commands will take place.

A bare-bones example:

all: mx.mexa64

mx.mexa64: mx.o
    mex -o mx.mexa64 mx.o -L/usr/local/cuda/lib64 -lcudart -lcudadevrt

mx.o: mxfunc.o helper.o
    nvcc -arch=sm_35 -Xcompiler -fPIC -o mx.o -dlink helper.o mxfunc.o -lcudadevrt

mxfunc.o: mxfunc.c
    mex -c -o mxfunc.o mxfunc.c

helper.o: helper.c
    nvcc -arch=sm_35 -Xcompiler -fPIC -c -o helper.o helper.c

clean:
    rm -fv mx.mexa64 *.o

... where mxfunc.c contains the mxFunction but helper.c does not.

EDIT: You may be able achieve the same effect in the automatic compilation system. Right click on each source file and select Properties, and you'll get a window where you can add some compilation options for that individual file. For linking options, open Properties of the project. Do some experiments and pay attention to the actual compilation commands that show up in the console. In my experience, custom options sometimes interact with the automatic system in a weird way. If this method proves too troublesome for you, I suggest that you make a custom Makefile; this way, at least we are not caught by unexpected side-effects.