How to best handle valid but impossible input?

https://softwareengineering.stackexchange.com/questions/401965

programming-practices

05-03-2021
|

Question

Say I write a function which expects the input to be A or B. The input may be C, which is technically valid but should be impossible if the logic of the rest of the program is working correctly.

Should I:

Add an if-else clause to handle all the possible inputs and throw an exception / return null if necessary.
Assume the input can not be C and only handle the rest.

Which of those 2 variants are better? The first variant let us catch the error as soon as possible, but leads to very cluttered and hard to read code. The second variant leads to cleaner code, but if the rest of the program is working incorrectly, the bug may propagate far from its origin.

This is to address the internal logic of a program, not the user input, which we obviously should be able to handle all the possible inputs.

Solution

Defensive programming

The input may be C, which is technically valid but should be impossible if the logic of the rest of the program is working correctly.

The keyword here is if. This is where we delve into the realm of defensive programming:

Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances. Defensive programming practices are often used where high availability, safety, or security is needed.

Most importantly, take note of the last sentence. Defensive programming is useful only when defending against mistakes is important. If this application is a small personal tool you use, it matters little.

Assuming that this application is important enough to ensure that only a valid value is entered, then defensive programming applies.

Implementing defenses

Say I write a function which expects the input to be A or B.

Right off the bat, enum came to mind. When you're dealing with a restricted list of options, and there's no existing primitive that covers only those restricted options, then an enum allows you to create a closed list.

public enum MyOptions { A, B }

This is, in my opinion, the cleanest approach, but it is also not a perfect defense. Enums are really just ints under the hood, and you can force any int value into it, e.g.:

public enum MyOptions { A = 0, B = 1 }

MyOptions valid   = (MyOptions) 1;
MyOptions invalid = (MyOptions) 2;  // No compiler or runtime error!

This brings us back to considering how much you need to defend. If abusing the enum causes a security breach, then you shouldn't just rely on enums to ensure security. If abusing the enum yields no benefits to the abuser, then it's acceptable to keep it at that.

As for your suggestions:

Assume the input can not be C and only handle the rest.

Assuming there is no big risk involved (e.g. security breaches), this is the preferred approach.

That being said, I do still urge you to implement the enum, as it will help future developers (consumers of your code) to realize which values are expected/allowed.

Add an if-else clause to handle all the possible inputs and throw an exception / return null if necessary.

Regardless of using the enum or not, when there is a risk involved with not checking it (e.g. security breach), then the if (or similar validation structure) becomes the desired approach.

OTHER TIPS

Assuming you are writing in a language that is OO first rather than functional first.

Run time checks

Hard to read code? Why? You can start the method by expressing that certain values are not valid. That should make it easier to read/understand.

Validation is a different concern anyway, you should know where it starts and where it ends, refactor your code so that is easy to see, thus improving readability. If the code to validate the input is actually complicated enough to make it hard to read, that is a clue that you should be extracting the validation to another method.

I understand there are some situations where it is non-obvious how do to that without a compromise in efficiency. For example, validating a list could mean to iterate over it. And then iterate over it again to actually use it... Well, if you have a solution such as Linq or similar, you should be able to write the validation code separate from the use of the list, while iterating over it only once, thanks to Linq being lazily evaluated.

You might also want to avoid duplicated validation in the internals of the code... you could use conditional compilations, debug assertions, code contracts, or similar solutions to make the extra validation only on development and testing environments.

Compile time checks

If you have a good code contracts solution, it would be validated on build.

Speaking of tools validating the code... can you get or write one? I'm thinking of a lint or similar, which could check preconditions on the code and be added to build integration. I know such tool is not always cost effective, however if your situation is specific enough and important enough, then such tool would be of value.

However, it is better to express validations as part of the type system, so they are checked at compile time without the need of additional tools.

You could – for example – have a type T, and a type ValidatedT, that can only be initialized with valid values of T. Then your internal code will only accept ValidatedT (of course, with a better name), making sure that the validation already happened, and regardless of how many method the ValidatedT value is passed down, the validation only happened once. This should also improve readability, as you could clearly see where do you have T and where do you have ValidatedT. Not only that, but you could have ValidatedT for different situations...

I mean, if I say the type string, what is it? A username? a product description? what? However, if I say the type Username, then you know what it is in the system, and you know it is a valid value.

Oh, by the way, you can still get the benefits of easier to read code, although without the compile time checking, if you use type aliases.

One thing to consider is that if you start building checks to stop valid but unexpected input (C), you essentially coupling the function to the application. In other words, you are making the function less likely to be reusable in another context. If you look at what's going on in a typical application, there are many parts of it that can accept input that you will never be providing to it.

For example, you might have a limit on the length of postal codes which you store in a string. The String's constructor allows many more characters as well as characters that aren't valid in postal codes.

Should you then define a new String class that only allows the valid characters? You could but that's not a very common solution. Instead, we build checks in the code that creates the postal code string, perhaps in a PostalCode class that is composed of a string (and maybe some other things.)

If this function is already inherently coupled to this application as it is designed, it's probably not something to worry about. But if this function is useful on it's own, you might want to avoid building such assumptions into it. Even within an application, as it grows and is refined over time, you might find that this function could be reused elsewhere. If the constraints are built into it but don't apply in the new context, you will then need to address moving them somewhere else first.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange