Debug function input vs expecting code users to read the documentation - How far do I go? [duplicate]

https://softwareengineering.stackexchange.com/questions/297920

10-10-2020
|

Question

I could write an endless amount of debugging code to handle the various components of input for a function, check that the correct data types are used, make sure things are decendants of the proper prototype objects, the data is within a certain value range or set... The list could go on and on.

But also of course I can use documentation comment APIs to generate good documentation on the expected input and usage.

My question is, how far/in-depth should I go as far as debugging/checking input for code users versus expecting them to read the funtion / method's documentation for proper use?

We could say "That's up to whoever's in charge" but I'm interested in best practice for maximum efficiency.

Solution

Focusing on writing "efficient" code without actually defining efficiency is a common trap developers fall into.

Efficiency isn't always reducing the number of CPU cycles required to accomplish a task. In fact, for most programmers, in most cases, obsessing over shaving CPU cycles is a complete waste of time.

Where Efficiency = minimizing CPU cycles

The majority of business software contains trivial logic.

Imagine spending 30 minutes shaving a few CPU cycles off your code. Let's just say you had a good day and shaved off a 2000 cycles. Your typical 2GHz CPU will go through 2,000,000,000 cycles every second which means your 2000 cycles saved amount to 0.000001 seconds or 1 microsecond saved.

Your application would need to run that optimized code 30,000,000 times before you break even on your time investment.

In all my time in the business software world, I can't think of one instance where that initial 30 minute investment would have been paid off before I left that role for another job.

Where Efficiency = time saved IN GENERAL

On the other hand, if you define efficiency in terms of time saved in general then your ROI sky rockets.

So, in this scenario, time saved includes (but is not limited to) the following:

Your time testing your API
API Consumer's time debugging issues with your API
Your time debugging issues with your API Consumer
Your time handling calls about a brittle and broken API (no, the consumer didn't read your documentation)

With this definition of efficiency, a little bit of defensive programming goes a long way. Making your API fail early and fail hard on bad input data requires very little upfront time and saves you and your consumers' a boatload of downstream time.

The big thing to remember is that your time and your API consumer's time is much more valuable (read: expensive) than CPU time.

Other domains

There are however domains where efficiency can mean minimizing CPU cycles. But, keep in mind that those domains are fewer and fewer nowadays.

Two that come to mind are coding for realtime systems and high volume data analysis.

TL;DR

So, unless you know for sure that you need to shave off CPU cycles, and better approach would be code in such a way that you save yourself and others as much time as possible. In the API world you can accomplish this with clear code contracts and defensive programming.

OTHER TIPS

There are two sides of the problem.

As a developer, you have to sanitize your inputs. Don't trust the users to read your documentation in-depth, and don't trust the users to follow the documentation at all. Security-wise, if it's a public API and you expect the users to sanitize the input, things will go very wrong very quickly. For non-public APIs, trusting the inputs is problematic as well; if your code makes bad things to happen when given an invalid input, it's your code which will be blamed, not the caller's.

If the input is invalid, throw an exception.
As an API provider, you may want to simplify the work of the callers who may not be familiar with your API (and find your documentation too difficult, impractical or boring to read). One way to simplify their life is to provide explicit error messages. This is the difference between:
```
throw new Exception("Invalid input.");
```
which simply means RTFM, and:
```
throw new ArgumentOutOfRangeException("price", "The price cannot be negative.");
```
which gives a convenient way to know exactly what is wrong with the caller's code.

If the input is invalid and you have enough time to explain why is it invalid, do that in the exception.

If the input error is easily done and is encountered often, focus on that to either eliminate the possibility for the callers to do the error, or provide a very detailed error message.

Of course it depends on the way things are done in your organization, (policy of quality work vs. policy of quick and dirty throw-away work, presence or absence of a culture of respect shown by middleware programmers towards application programmers, etc.,) but in general:

The "best" thing is considered to be to write self-documenting APIs whose usage is as clear as possible with just a brief skim through the interface, and to never expect callers of your API to have memorized, or even to have bothered in any great length with, the documentation. Callers of your API will invoke it in weird ways while examining "what if" scenarios. Callers of your API will invoke it in ways that you never even imagined.

APIs are generally expected to be bulletproof and idiotproof.

And this is a good thing. Why? Because:

A "Null Pointer Exception" or a "Division by Zero" exception thrown somewhere within your code is always considered to be your fault. So, for every single bug in the caller's code, the bug gets nonetheless assigned to you. You have to reproduce it, you have to troubleshoot it, you have to discover that it is their fault instead, and you have to convince them that it is so.

An "Invalid Argument" exception thrown by your code is always considered to be the caller's fault. The bug report goes directly to them. Peace of mind.

Excessive arguments between application programmers and middleware programmers where each is trying to prove that it's the other one's fault are generally indicative of sloppy work on behalf of the middleware programmer.

Edit:

I just noticed the term "maximum efficiency" in the question. It is not clear what you mean by efficiency. Inefficiencies in the development process (quarrels about whose fault it is) can often waste huge amounts of time. Now, inefficiencies in the runtime are generally not an issue, because you can always use assertions, which are only enabled on the debug build of the application. But if you do not want to use assertions, the inefficiencies that stem from hard-coded parameter checking are generally not an issue in modern, multi-core, multi-pipelined, multi-level-cached, multi-gigaherz machines. If someone has an issue, tell them to buy better hardware. This is often considered to also be a best practice.

As the developer of an API you have to decide and to document what valid inputs are, and you have to decide and document what you do with invalid inputs. You also need to decide what are things that don't work, and what are programmer errors. For example, a path to a non-existing file may "not work", while a null pointer or empty string as the path may be a "programming error". And finally, this should be discussed with the users of the API (unless you are experienced enough to decide without their input).

If possible, detect programming errors (that is errors where you expect someone to change their code) and stop your program. If possible, detect other problems, avoid crashing or doing something bad, and report the problem so that it can be fixed or handled by the caller.

Readable documentation helps making users read your documentation :-)

For a public-facing API, here's my rule(s) of thumb:

Validate all inputs that can cause unexpected behavior.
Validate inputs when you first see them.
- Don't re-validate inputs in internal helper functions.
For inputs that could conceivably come from user input, return helpful error codes / messages.
- if (date < today) return err_NEED_FUTURE_DATE;
- note that it's not just err_INVALID_DATE, that doesn't provide enough help.
For inputs that likely come from the programmer, raise Assertions/exceptions:
- if (buffer_size < 1) RAISE("Buffer must be at least one byte");

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange