Features to catch bugs that make it to production

https://softwareengineering.stackexchange.com/questions/384379

17-02-2021
|

Pregunta

Apologies if this isn't the right place for this question, please direct me elsewhere if that is the case :)

I was having a discussion with my boss who has experience (but not official education) in software engineering about a member-field in an object that indicates whether the object is active.

The objects in question are built by employees (imagine something like antivirus rules) who would specify the object parameters and attributes and goal of the object.

This object is then compiled into a binary format and distributed to our customer's software installations via mirrors and the software updater.

This object contains a field which indicates whether it is active or inactive, in early design and development phases this was added because it was theorized we may need to turn the object off at the endpoint of distribution.

The objects are stored in separate files on customer software installations and turning the object off would mean actively writing the binary object file to adjust the field inside. There was no real expected use-case for this internal 'active' field, and the customers aren't expected to know anything about these objects. So it was expected that nobody and nothing would ever turn these objects off -- especially because you could just delete the file to turn it off, and updates of the file would turn it back on regardless of whether you deleted it or toggled it.

The idea that we would need to toggle the object at the endpoint was quickly dismissed because the interface reached completion and it is connected to a database which houses the objects and whether each object is active or inactive.

The interface would quite simply not push the object out for distribution if it is listed as inactive in the database, it's a very simple mechanism.

Now fast forward to our conversation we were discussing the purpose of this field in the compiled object file.

I stated that we could simply remove the field from the compiled object because if the object is being pushed out then it must be enabled in the interface anyway.

My boss said that I am assuming UI programmers don't make mistakes and that having the second line of defense would prevent use of an object that wasn't intended to be pushed out but was pushed out somehow in error.

I said I didn't really consider 'developer mistakes' as being a reason to implement a feature (mainly in this case) and that mistakes should generally be caught by testing/debugging/QA and not worked around with side features.

After-all, it would be the interface which populates the 'active' field of the compiled object anyway, so if the interface erroneously pushed out an object which wasn't supposed to be active there's a large possibility the compiled object would also indicate it was active in the internal field because the interface populated that field. (Unless of course the bug is elsewhere like the interface's comparison of the active field)

My boss said "don't think of it as a 'feature' but a safety" while also suggesting that my stance on the matter was naive and the statement 'mistakes should be caught by testing/debugging/QA' certainly isn't true in the real world.

Ultimately this is such a small issue that I'm not aiming to argue whether or not the field is retained, I'm just curious about the principle of the stance taken in the discussion by my boss and I am curious what other professionals in the industry have to say on this.

My reasoning for this being a bad approach is:

If you assume that testing/debugging/QA cannot catch even the simplest of bugs, and that you must code in extra features to protect against these simple bugs -- isn't that indicative that there is a deeper issue in the development process?

Furthermore, if you have to code in extra features to protect against bugs, what if those extra features have bugs? Do you program in even more extra features to protect against possible bugs in the features that protect against bugs?

Solución

You ask:

If you assume that testing/debugging/QA cannot catch even the simplest of bugs, and that you must code in extra features to protect against these simple bugs -- isn't that indicative that there is a deeper issue in the development process?

Who's suggesting that these processes or people "cannot catch even the simplest of bugs?" That sounds like a straw man that was not brought up by your boss. I have seen extremely simple bugs make it past many layers of high quality testing and into shipping. It definitely happens, and it's not because people are incompetent or processes are useless. It's because software is extremely complex and there literally isn't enough time in the lifetime of the universe to cover even a fraction of the possible inputs a program can have.

Furthermore, if you have to code in extra features to protect against bugs, what if those extra features have bugs? Do you program in even more extra features to protect against possible bugs in the features that protect against bugs?

That sounds like a slippery slope argument. We can also take it in the other direction. If coding extra features to help protect against or find bugs doesn't work, can even coding regular features work? Can software work at all?

The answer is of course that yes software can work, testing can work, and defensive programming practices can work. But they're not perfect and never will be. You have to find the right balance between these things so that you can stay in business and accomplish a task that users need accomplished. Sometimes that means writing suboptimal code or using a gross workaround to get something out the door.

There are a number of ways to automate parts of these tasks to improve your chances of finding bugs and to keep your code from regressing and having previously-fixed bugs pop up again. If you aren't already using them, you should look into your compiler warnings, static analysis of your code, test-driven development, and other automated techniques. They are a great compliment to manual testing and are often inexpensive to implement.

Otros consejos

I got a bit lost in the minutia of your particular details, but I found some interesting questions in there:

If you assume that testing/debugging/QA cannot catch even the simplest of bugs, and that you must code in extra features to protect against these simple bugs -- isn't that indicative that there is a deeper issue in the development process?

I don't know about simple bugs but certainly some bugs ship in my field that manage to fly under the radar of unit and integration testing and QA, unfortunately, and sometimes even internal testing (I'm hoping this doesn't happen so often at NASA).

In my case the most valuable safeguard against that is logging. We actually don't need logging at all as a feature except for the sole purpose of helping us to narrow down bugs which unfortunately managed to ship because it's not practical to expect to be able to run debuggers on the user's machine. Our testing procedure is reasonably tight but the logger has been a lifesaver particularly to help us narrow down hardware incompatibility issues which sometimes occur in ways no one in our entire team can reproduce since we do some fancy cutting-edge stuff with SIMD and our GPU shaders (which also tends to be where we find the most bugs that managed to ship, though sometimes it was because the users didn't read our minimum hardware requirements but we at least discovered that we didn't detect that properly in the software).

Furthermore, if you have to code in extra features to protect against bugs, what if those extra features have bugs? Do you program in even more extra features to protect against possible bugs in the features that protect against bugs?

If I go back to logging as an example, it's a last resort defense. If, somehow, our logging is also fubar (though we do test it to some degree), then we're kind of SOL in those cases if users encounter a bug that managed to ship (we might have to send them a build with fixed logging just to narrow it down). But as a last bastion of defense, it has managed to hold up pretty well. That said, we actually do make use of it and test it to some degree as well as look at it from time to time just to see what all the program has done in ways outside of debugging.

Now last resort defense is different from hiding bugs. I've seen ghastly code like this:

// Pre-condition foo should never be null:
void do_something(Foo* foo)
{
     if (!foo)
          return;
     ...
}

Which is absolutely horrid given that it's not even generating an error. Unfortunately I worked in a team ages before when I was younger where the senior devs thought this was a good practice, and they made this mythical idea that a programmer fixing bugs creates two new bugs into something of a reality with the way they "fixed" bugs. But that's very different from something like logging. What they were doing was deliberately hiding (as opposed to fixing) bugs. I have sometimes an obnoxious and argumentative side to me that I'd blame on the experience of being in that team. :-D

In your case I don't know the nuance and details to judge how stupid (or not) your boss' argument is to have some safety feature to turn off/disable these objects. But I'd just try to remember that it's all about making money and running a business and if this thing is causing real productivity issues for you, to talk about it that way rather than tackle it in terms of "correctness" or "optimal practice", because I think bosses will relate to that more. But you know, there's lotsa things to life like pretty girls which are so awesome or good-looking guys (like me for the ladies), and I've found it helpful over the years to chillax a bit for the stuff that might be contested unless it's really, really getting in your way of getting your work done.

I'm not sure I understand your specific problem exactly, but the general theme itself is very common - with a number of aspects.

Requirements

Without clear, documented and agreed targets, the function of any system is up for grabs. In the absence of requirements there are likely to be a lot of differences of opinion as to how a system should work with the more senior members of staff being in a position to push through their view whether it is for the good of the code base or not.

The process spiral

When a manual process goes wrong, the knee jerk reaction is to add yet another process to check the first. This is a fools errand as you now have two things that can go wrong as well as the entire process taking longer due to increased checks and measures.

This isn't to say this should never happen for critical functions, but it certainly shouldn't be considered as a silver bullet that can be deployed everywhere.

The business process

There is the concept (in management especially) that software should just flow, show no errors, aid the user and be intuitive. Sometimes these unrealistic demands are orthogonal to the requirements themselves. A lot of development time can be spent here if these lofty aspirations don't go unchecked.

Given where you are, I suspect documented requirements are lacking somewhat, but that shouldn't stop you getting started with this now. You'd be astonished how many new critical features fall by the wayside when you ask the powers at be to fully document and justify the software changes. Many managers (with regards to software) are winging it a lot of the time as they don't understand the code base as well as the developers.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange