Cuda-gdb not stopping at breakpoints inside kernel

https://stackoverflow.com/questions/21723664

10-10-2022
|

Frage

Cuda-gdb was obeying all the breakpoints I would set, before adding '-arch sm_20' flag while compiling. I had to add this to avoid error being thrown : 'atomicAdd is undefined' (as pointed here). Here is my current statement to compile the code:

nvcc -g -G --maxrregcount=32 Main.cu -o SW_exe (..including header files...) -arch sm_20

and when I set a breakpoint inside kernel, cuda-gdb stops once at the last line of the kernel, and then the program continues.

(cuda-gdb) b SW_kernel_1.cu:49
Breakpoint 1 at 0x4114a0: file ./SW_kernel_1.cu, line 49.
...
[Launch of CUDA Kernel 5 (diagonalComputation<<<(1024,1,1),(128,1,1)>>>) on Device 0]

Breakpoint 1, diagonalComputation (__cuda_0=15386, __cuda_1=128, __cuda_2=0xf00400000, __cuda_3=0xf00200000, 
__cuda_4=100, __cuda_5=0xf03fa0000, __cuda_6=0xf04004000, __cuda_7=0xf040a0000, __cuda_8=0xf00200200, 
__cuda_9=15258, __cuda_10=5, __cuda_11=-3, __cuda_12=8, __cuda_13=1) at ./SW_kernel_1.cu:183
183     }
(cuda-gdb) c
Continuing.

But as I said, if I remove the 'atomicAdd()' call and the flag '-arch sm_20' which though makes my code incorrect, but now the cuda-gdb stops at the breakpoint I specify. Please tell me the reasons of this behaviour.
I am using CUDA 5.5 on Tesla M2070 (Compute Capability = 2.0).
Thanks!

Lösung

From the CUDA DEBUGGER User Manual, Section 3.3.1:

NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the debugging information necessary for CUDA-GDB to work properly. The -g -G option pair must be passed to NVCC when an application is compiled in order to debug with CUDA-GDB; for example,

nvcc -g -G foo.cu -o foo

Using this line to compile the CUDA application foo.cu

forces -O0 compilation, with the exception of very limited dead-code eliminations and register-spilling optimizations.

makes the compiler include debug information in the executable

This means that, in principle, breakpoints could not be hit in kernel functions even when the code is compiled in debug mode since the CUDA compiler can perform some code optimizations and so the disassembled code could not correspond to the CUDA instructions.

When breakpoints are not hit, a workaround is to put a printf statement immediately after the variable one wants to check, as suggested by Robert Crovella at

CUDA debugging with VS - can't examine restrict pointers (Operation is not valid)

The OP has chosen here a different workaround, i.e., to compile for a different architecture. Indeed, the optimization the compiler does can change from architecture to architecture.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow