Good code style to introduce data checks everywhere?

https://softwareengineering.stackexchange.com/questions/369680

05-02-2021
|

Question

I have a project that is sufficiently large in size that I can't keep every aspect in my head any more. I'm dealing with a number of classes and functions in it, and I'm passing data around.

With time I noticed that I kept getting errors, because I forgot what precise form the data has to have when I pass it to different functions (e.g., one function accepts and outputs an array of strings, another function, that I wrote much later, accepts strings that are kept in a dictionary etc., so I have to transform the strings I'm working with from having them in an array to having them in a dictionary).

To avoid always having to figure out what broke where, I started treating each function and class as being an "isolated entity" in the sense that it cannot rely on outside code giving it the correct input and has to perform input checks itself (or, in some cases, recast the data, if the data is given in the wrong form).

This has greatly reduced the time I spend making sure that the data that I pass around "fits" into every function, because the classes and functions themselves now warn me when some input is bad (and sometimes even correct that) and I don't have to go with a debugger through the whole code anymore to figure out where something went haywire.

One the other hand this has also increased the overall code.
My question is, if this code style appropriate for solving this problem?
Of course, the best solution would be to completely refactor the project and make sure the data has a uniform structure for all functions - but since this project is growing constantly, I would end up spending more and worrying about clean code than actually adding new stuff.

(FYI: I'm still a beginner, so please excuse if this question was naive; my project is in Python.)

Solution

A better solution is to take more advantage of Python language features and tooling.

For example, in function 1, the expected input is an array of strings, where the first string denotes the title of something and the second a bibliographical reference. In function 2 the expected input is still an array of strings, but now the roles of the strings are inversed.

This problem is mitigated with a namedtuple. It's lightweight and gives easy semantic meaning to the members of your array.

To grab the benefit of some automatic type checking without switching languages, you can take advantage of type hinting. A good IDE can use this to let you know when you do something dumb.

You also seem worried about functions going stale when requirements change. This can be caught by automated testing.

While I am not saying that manually checking is never appropriate, a better use of available language features can help you solve this problem in a more maintainable way.

OTHER TIPS

OK, the actual problem is described in a comment below this answer:

For example, in function 1, the expected input is an array of strings, where the first string denotes the title of something and the second a bibliographical reference. In function 2 the expected input is still an array of strings, but now the roles of the strings are inversed

The problem here is the use of a list of strings where the order signifies semantics. This is a really error prone approach. Instead you should create a custom class with two fields named title and bibliographical_reference. That way you are not going to mix them up, and you avoid this problem in the future. Of course this requires some refactoring if you already use lists of strings in a lot of places, but believe me, it will be cheaper in the long run.

The common approach in dynamically types languages is "duck typing", which mean you dont really care about the "type" of the object passed, you only care if it supports the methods you call on it. In your case, you will simply read the field called bibliographical_reference when you need it. If this field does not exist on the object passed, you will get an error, and this indicates the wrong type is passed to the function. This is as good a type check as any.

First of all, what you are experiencing right now is code smell - try to remember what lead to you becoming conscious of the smell and try to hone your "mental" nose, as the sooner you notice a code smell the sooner - and easier - you are able to fix the underlying issue.

To avoid always having to figure out what broke where, I started treating each function and class as being an "isolated entity" in the sense that it cannot rely on outside code giving it the correct input and has to perform input checks itself.

Defensive programming - as this technique is called - is a valid and often used tool. However, as with all things it is important to use the right amount, too little checks and you will not catch issues, too many and your code will be over-bloated.

(or, in some cases, recast the data, if the data is given in the wrong form).

That might be a less-good idea. If you notice a part of your program is calling a function with incorrectly formatted data, FIX THAT PART, not change the called function to be able to digest bad data anyway.

This has greatly reduced the time I spend making sure that the data that I pass around "fits" into every function, because the classes and functions themselves now warn me when some input is bad (and sometimes even correct that) and I don't have to go with a debugger through the whole code anymore to figure out where something went haywire.

Improving the quality & maintainability of your code is a time saver in the long run (in that sense I must again warn against the self-correction functionality you built into some of your functions - they might be an insidious source for bugs. Just because your program does not crash & burn does not mean it works right...)

To finally answer your question: Yes, defensive programming (i.e. verifying the validity of the provided parameters) is - in a healthy degree - a good strategy. That said, as you said yourself, your code is inconsitent, and I'd strongly reccommend that you spend some time to refactor the parts that smell - you said you do not want to worry about clean code all the time, spending more time on "cleaning" than on new features... If you don't keep your code clean you might spend twice the time "save" from not keeping a clean code on squashing bugs AND will have a hard time implementing new features - technical debt can crush you.

That's okay. I used to code in FoxPro, where I had a TRY..CATCH blocks almost in every big function. Now, I code in JavaScript/LiveScript and rarely check parameters in "internal" or "private" functions.

"How much to check" depends on the chosen project/language more than it depends on your code skill.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange