CUDA Perfomance Profiling with Nvidia NSight in VS2010 - segment on the time line

Question

In Nsight Visual Studio Edition each range in the Process\CUDA\Context\Compute row is a kernel launch.

The range start timestamp is the time at which the first instruction of flatThreadIdx = 0 executed.
The range end timestamp is the completion of the kernel.

Nsight timeline does not show information below the kernel level (thread blocks, warps, or threads) as this would be an enormous amount of data. The Grid Dimensions and Block Dimensions of the launch are available in the tool tip, in the correlation pane at the bottom of the timeline and in the CUDA Launches report page.

The Nsight VSE Profile CUDA Application Activity can collect per instruction counters that can be used to see the number of threads that executed each instruction in the application. In order to collect this set the Experiments to Run drop-down to All or set it to Custom and select the Instruction Count experiment. The Source View report page will have columns for Instructions Executed and Thread Instructions Executed. Optionally, you can configure the Instruction Count experiment to collect active mask histograms and predicate histograms per instruction.