Question

I have a multithreaded C program, which consistently generates a segmentation fault at a specific point in the program. When I run it with gdb, no fault is shown. Can you think of any reason why the fault might occur only when not using the debugger? It's pretty annoying not being able to use it to find the problem!

Was it helpful?

Solution

Classic Heisenbug. From Wikipedia:

Time can also be a factor in heisenbugs. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs such as race conditions may not reproduce when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.

The debugger may be changing timing, and hiding a race condition.

On Linux, GDB also disables address space randomization, and your crash may be specific to address space layout. Try (gdb) set disable-randomization off.

Finally, ulimit -c unlimited and post-mortem debugging (already suggested by Robie) may work.

OTHER TIPS

Perhaps when using gdb memory is mapped in a location which your over/under flow doesn't trample on memory that causes a crash. Or it could be a race condition that is no longer getting tripped. Although it sounds unintuitive, you should be happy your program was nice enough to crash on you.

Some suggestions

  1. Try a static code analyzer such as the free cppcheck
  2. Try a malloc() debugger like libefence
  3. Try running it through valgrind

By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.

I have totally had this problem before! It was a race condition, and when I was stepping though the code with a debugger the thread i was in was slow enough to not trigger the race condition. Pretty awful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top