You've shipped, you get a rare seg fault. Pointer checking or let it go?

https://softwareengineering.stackexchange.com/questions/3482

pointers
bug

16-10-2019
|

Question

You've shipped, asserts are turned off, you receive a rare crash report indicating that a null pointer violation occurred in your code. In a development environment, the problem would have been caught by an assert.

All you have is a crash report, so reproducing the issue is near impossible. Following the backtrace doesn't give any clues as to why the crash happened in the first place.

Options: - Add pointer checking to prevent the crash. This will prevent the crash, but you probably won't even find out why it happened in the first place. - let it fly, hope that it happens again with a repro scenario

Let's say the application isn't intended for a guided missle or automatic braking system...

Which would you choose?

Solution

I chose the second approach. There is no point in hiding the crash if the NULL pointer was unexpected at the point where crash has occured. This NULL pointer in most cases would just be one of the symptom of something else is wrong. If we hide it with a NULL pointer check it is almost certain that something else will break. I feel you have a better chance catching the scenario if you know the point where it crashes everytime instead at some random place.

OTHER TIPS

How often does the crash occur? It happens just for one in many customers in some obscure case? What are the consequences (data loss, system crash)? If it happens every 1 in a million cases and they just have to restart the application and no data is lost then probably you don't need to fix it - leave it like that.
How expensive (money and time) is to add the asserts and ship it to all customers(if just a part of the customers get the new version then the rest might get into the not checked null problem)? What are the chances of finding the problem? If you just put random checks in the code hoping to catch the error then is a bad practice...
Can the problem be reproduced on the customer's machine? Can you get access to that machine? This might be really valuable
Review your crash reports and make sure the information provided is useful and can help you diagnose the problem

_{In a development environment, the problem would have been caught by an assert.}

_{In a specific order it would have been caught and fixed, but the current back-trace has never been caught.

You should be able to see what went wrong with the crash dump, have you checked parameters, etc...?}

The extras that can be done based on the amount of time you want to put into this:

Archive the crash dump and refer to it in the code with a comment at the line it crashed,
this allows one that examines a very similar chrash dump to know that it has happened before...
[time spent: short]
Additional checks, logging, ... You want to prevent it and get more information next time.
[time spent: medium]

Null pointer violation occurred in your code.
Check that it's impossible to call the application in such way for this violation to happen.
[time spent: long]

These days, I ship with assert() turned on. It doesn't cost much and it can make life a lot easier in hostile situations (i.e., your customer's environments are often more hostile than your dev or QA environments).

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange