Differences between Nsight debug launch and normal OS launch

Question 1

The driver crash followed by API errors occurs when your app hits a windows TDR event due to kernel execution taking too long. You can work around this by modifying the system registry, or putting a Quadro or Tesla GPU in TCC mode, or reducing the run-time of your kernel(s).

When you debug with nsight, your kernel execution may get halted for various reasons (single step, breakpoints, and other reasons), and then restarted, depending on what you are doing exactly in your debug session. The halting of the kernel execution allows the windows watchdog to be satisfied without a TDR event.

Question 2

CUDA nSight debugger allows you to debug the CUDA kernels line by line, you can't do this with the standard Visual Studio debugger.

Presumably nSight performs some code injection to enable it to detect the runtime of kernels, its also possible on your settings that when debugging with nSight your kernels may not be executing on the GPU. These could be the cause of errors coming and going between debuggers. I know when I used them I had similar inconsistencies.

If you run your program through the nSight profiler it should be able to clearly log the memCpy errors for you.