As the question states, is it useful to always collect a software-based backtrace
Yes, it is generally very useful to have a crash stack trace when:
- your code runs in your own environment, and you are not worried about the stack trace revealing any secrets.
- when the crash handler does not further corrupt the coredump, does not hang, etc.
like using libc backtrace
glibc backtrace
calls calloc
under certain conditions, and is not safe in a crash handler. It can cause both the hang, and the further corruption mentioned above. Writing a crash handler that will reliably print stack trace in async-signal-safe manner is quite non-trivial.
why then do error functions in "standard" applications not call backtrace?
Consider cat /no/such/file
. Currently it produces:
cat: /no/such/file: No such file or directory
which is all you really need to know. Making this print anything else is useless. If you had many such files, and cat printed a full stack trace for each, you'd get many pages of error output, and that would only make finding the real problem harder.
For fatal signal handlers (e.g. SIGSEGV
) the answer is that most "standard" applications don't actually handle such signals, and simply use the default action, which produces a core dump.
But if they did catch the signal, calling backtrace
, backtrace_symbols
, or backtrace_symbols_fd
from the signal handler would be equally unsafe, and could deadlock, which is much worse than simply dumping core. Consider what happens if you have a long-running script with a 1000 commands in it. You start it, and a week later discover that it didn't make any progress because the second command crashed and deadlocked trying to print the crash stack trace.