Question

I wonder, assuming there is only one kernel that I call, in the timeline view of the application tracing report, under

Process -> CUDA -> Compute

is each segment I can see in the timeline window representing a warp carrying out an operation? if it is, is there a way for me to see exactly how many thread participated in this operation? (I assume it is not necessarily the number I specify when launch the kernel)

Était-ce utile?

La solution

In Nsight Visual Studio Edition each range in the Process\CUDA\Context\Compute row is a kernel launch.

  • The range start timestamp is the time at which the first instruction of flatThreadIdx = 0 executed.
  • The range end timestamp is the completion of the kernel.

Nsight timeline does not show information below the kernel level (thread blocks, warps, or threads) as this would be an enormous amount of data. The Grid Dimensions and Block Dimensions of the launch are available in the tool tip, in the correlation pane at the bottom of the timeline and in the CUDA Launches report page.

The Nsight VSE Profile CUDA Application Activity can collect per instruction counters that can be used to see the number of threads that executed each instruction in the application. In order to collect this set the Experiments to Run drop-down to All or set it to Custom and select the Instruction Count experiment. The Source View report page will have columns for Instructions Executed and Thread Instructions Executed. Optionally, you can configure the Instruction Count experiment to collect active mask histograms and predicate histograms per instruction.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top