Should I add redundant code now just in case it may be needed in the future?

https://softwareengineering.stackexchange.com/questions/313998

13-12-2020
|

Question

Rightly or wrongly, I'm currently of the belief that I should always try to make my code as robust as possible, even if this means adding in redundant code / checks that I know won't be of any use right now, but they might be x amount of years down the line.

For example, I'm currently working on a mobile application that has this piece of code:

public static CalendarRow AssignAppointmentToRow(Appointment app, List<CalendarRow> rows)
{
    //1. Is rows equal to null? - This will be the case if this is the first appointment.
    if (rows == null) {
        rows = new List<CalendarRow> ();
    }

    //2. Is rows empty? - This will be the case if this is the first appointment / some other unknown reason.
    if(rows.Count == 0)
    {
        rows.Add (new CalendarRow (0));
        rows [0].Appointments.Add (app);
    }

    //blah...
}

Looking specifically at section two, I know that if section one is true, then section two will also be true. I can't think of any reason as to why section one would be false and section two evaluate true, which makes the second if statement redundant.

However, there may be a case in the future where this second if statement is actually needed, and for a known reason.

Some people may look at this initially and think I'm programming with the future in-mind, which is obviously a good thing. But I know of a few instances where this kind of code has "hidden" bugs from me. Meaning it's taken me even longer to figure out why function xyz is doing abc when it should actually be doing def.

On the other hand, there's also been numerous instances where this kind of code has made it much, much easier to enhance the code with new behaviours, because I don't have to go back and ensure that all the relevant checks are in place.

Are there any general rule-of-thumb guidelines for this kind of code? (I'd also be interested to hear if this would be considered good or bad practise?)

N.B: This could be considered similar to this question, however unlike that question, I'd like an answer assuming there are no deadlines.

TLDR: Should I go so far as to adding redundant code in order to make it potentially more robust in the future?

Solution

As an exercise, first let's verify your logic. Though as we'll see, you have bigger problems than any logical problem.

Call the first condition A and the second condition B.

You first say:

Looking specifically at section two, I know that if section one is true, then section two will also be true.

That is: A implies B, or, in more basic terms (NOT A) OR B

And then:

I can't think of any reason as to why section one would be false and section two evaluate true, which makes the second if statement redundant.

That is: NOT((NOT A) AND B). Apply Demorgan's law to get (NOT B) OR A which is B implies A.

Therefore, if both your statements are true then A implies B and B implies A, which means they must be equal.

Therefore yes, the checks are redundant. You appear to have four code paths through the program but in fact you only have two.

So now the question is: how to write the code? The real question is: what is the stated contract of the method? The question of whether the conditions are redundant is a red herring. The real question is "have I designed a sensible contract, and does my method clearly implement that contract?"

Let's look at the declaration:

public static CalendarRow AssignAppointmentToRow(
    Appointment app,    
    List<CalendarRow> rows)

It's public, so it has to be robust to bad data from arbitrary callers.

It returns a value, so it should be useful for its return value, not for its side effect.

And yet the name of the method is a verb, suggesting that it is useful for its side effect.

The contract of the list parameter is:

A null list is OK
A list with one or more elements in it is OK
A list with no elements in it is wrong and should not be possible.

This contract is insane. Imagine writing the documentation for this! Imagine writing test cases!

My advice: start over. This API has candy machine interface written all over it. (The expression is from an old story about the candy machines at Microsoft, where both the prices and the selections are two-digit numbers, and it is super easy to type in "85" which is the price of item 75, and you get the wrong item. Fun fact: yes, I have actually done that by mistake when I was trying to get gum out of a vending machine at Microsoft!)

Here's how to design a sensible contract:

Make your method useful for either its side effect or its return value, not both.

Do not accept mutable types as inputs, like lists; if you need a sequence of information, take an IEnumerable. Only read the sequence; do not write to a passed-in collection unless it is very clear that this is the contract of the method. By taking IEnumerable you send a message to the caller that you are not going to mutate their collection.

Never accept nulls; a null sequence is an abomination. Require the caller to pass an empty sequence if that makes sense, never ever null.

Crash immediately if the caller violates your contract, to teach them that you mean business, and so that they catch their bugs in testing, not production.

Design the contract first to be as sensible as possible, and then clearly implement the contract. That is the way to future-proof a design.

Now, I've talked only about your specific case, and you asked a general question. So here is some additional general advice:

If there is a fact that you as a developer can deduce but the compiler cannot, then use an assertion to document that fact. If another developer, like future you or one of your coworkers, violates that assumption then the assertion will tell you.
Get a test coverage tool. Make sure your tests cover every line of code. If there is uncovered code then either you have a missing test, or you have dead code. Dead code is surprisingly dangerous because usually it is not intended to be dead! The incredibly awful Apple "goto fail" security flaw of a couple years back comes immediately to mind.
Get a static analysis tool. Heck, get several; every tool has its own particular specialty and none is a superset of the others. Pay attention to when it is telling you that there is unreachable or redundant code. Again, those are likely bugs.

If it sounds like I'm saying: first, design the code well, and second, test the heck out of it to make sure it is correct today, well, that's what I'm saying. Doing those things will make dealing with the future much easier; the hardest part about the future is dealing with all the buggy candy machine code people wrote in the past. Get it right today and the costs will be lower in the future.

OTHER TIPS

What you are doing in the code you're showing above is not so much future proofing as much as it is defensive coding.

Both if statements test for different things. Both are appropriate tests depending on your needs.

Section 1 tests for and corrects a null object. Side note: Creating the list doesn't create any child items (e.g., CalendarRow).

Section 2 tests for and corrects user and/or implementation errors. Just because you have a List<CalendarRow> doesn't mean that you have any items in the list. Users and implementors will do things that you can't imagine just because they are allowed to do it, whether it makes sense to you or not.

I guess, this question is basically down to taste. Yes, it is a good idea to write robust code, yet the code in your example is a slight violation of the KISS principle (as a lot of such "future proof" code will be).

I personally would likely not bother making code bullet-proof for the future. I don't know the future, and as such, any such "future bullet-proof" code is doomed to fail miserably when the future arrives anyway.

Instead, I would prefer a different approach: Make the assumptions that you make explicit using the assert() macro or similar facilities. That way, when the future comes around the door, it will tell you precisely where your assumptions do not hold any more.

Another principle you might want to think about is the idea of failing fast. The idea is that when something goes wrong in your program, you want to stop the program altogether immediately, at least while you're developing it before releasing it. Under this principal, you want to write lots of checks to make sure your assumptions hold, but seriously consider having your program stop dead in its tracks whenever the assumptions are violated.

To put it boldly, if there's even a small error in your program, you want it to completely crash while you watch!

This might sound counterintuitive, but it lets you discover bugs as quickly as possible during routine development. If you're writing a piece of code, and you think it's finished, but it crashes when you test it, there's no question that you're not finished yet. Furthermore, most programming languages give you excellent debugging tools that are easiest to use when your program crashes completely instead of trying to do its best after an error. The biggest, most common example is that if you crash your program by throwing an unhandled exception, the exception message will tell you an incredible amount of information about the bug, including which line of code failed and the path through your code the program took on its way to that line of code (the stack trace).

For more thoughts, read this short essay: Don't Nail Your Program in the Upright Position

This is relevant to you because it's possible that sometimes, the checks you're writing are there because you want your program to try to keep running even after something's gone wrong. For example, consider this brief implementation of the Fibonacci sequence:

// Calculates the nth Fibonacci number
int fibonacci(int n) {
    int a = 0;
    int b = 1;

    for(int i = 0; i < n; i++) {
        int temp = b;
        b = a + b;
        a = temp;
    }

    return b;
}

This works, but what if someone passes a negative number to your function? It won't work then! So you'll want to add a check to make sure that the function is called with a nonnegative input.

It might be tempting to write the function like this:

// Calculates the nth Fibonacci number
int fibonacci(int n) {
    int a = 0;
    int b = 1;

    // Make sure the input is nonnegative
    if(n < 0) {
        n = 1; // Replace the negative input with an input that will work
    }

    for(int i = 0; i < n; i++) {
        int temp = b;
        b = a + b;
        a = temp;
    }

    return b;
}

However, if you do this, then later accidentally call your Fibonacci function with a negative input, you'll never realize it! Worse, your program will probably keep on running but start producing nonsensical results without giving you any clues as to where something went wrong. These are the hardest types of bugs to fix.

Instead, it's better to write a check like this:

// Calculates the nth Fibonacci number
int fibonacci(int n) {
    int a = 0;
    int b = 1;

    // Make sure the input is nonnegative
    if(n < 0) {
        throw new ArgumentException("Can't have negative inputs to Fibonacci");
    }

    for(int i = 0; i < n; i++) {
        int temp = b;
        b = a + b;
        a = temp;
    }

    return b;
}

Now, if you ever accidentally call the Fibonacci function with a negative input, your program will immediately stop and let you know that something's wrong. Furthermore, by giving you a stack trace, the program will let you know which part of your program tried to execute the Fibonacci function incorrectly, giving you an excellent starting point for debugging what's wrong.

Should you add redundant code? No.

But, what you describe is not redundant code.

What you describe is programming defensively against calling code violating the preconditions of your function. Whether you do this or simply leave it up to users to read the documentation and avoid those violations themselves, is entirely subjective.

Personally, I am a big believer in this methodology, but as with everything you have to be cautious. Take, for example, C++'s std::vector::operator[]. Putting aside VS's debug-mode implementation for a moment, this function does not perform bounds checking. If you request an element that doesn't exist, the result is undefined. It is up to the user to provide a valid vector index. This is quite deliberate: you can "opt in" to bounds checking by adding it at the callsite, but if the operator[] implementation were to perform it then you would not be able to "opt out". As a fairly low-level function, this makes sense.

But if you were writing an AddEmployee(string name) function for some higher-level interface, I'd fully expect this function to at least throw an exception if you provided an empty name, as well as this precondition being documented immediately above the function declaration. You may not be providing unsanitised user input to this function today, but making it "safe" in this manner means any precondition violations that pop up in the future can be easily diagnosed, rather than potentially triggering a chain of dominoes of hard-to-detect bugs. This is not redundancy: it's diligence.

If I had to come up with a general rule (although, as a general rule, I try to avoid those), I'd say that a function which satisfies any of the following:

lives in a super-high-level language (say, JavaScript rather than C)
sits at an interface boundary
is not performance-critical
directly accepts user input

… can benefit from defensive programming. In other cases, you can still write assertions that fire during testing but are disabled in release builds, to further increase your ability to find bugs.

This topic is explored further on Wikipedia (https://en.wikipedia.org/wiki/Defensive_programming).

Two of the ten programming commandments are relevant here:

Thou shall not assume that input is correct
Thou shall not make code for future use

Here, checking for null is not "making code for future use". Making code for future use is things like adding interfaces because you think they might be useful "some day". In other words, the commandment is to not add layers of abstraction unless they are needed right now.

Checking for null has nothing to do with future use. It has to do with Commandment #1: do not assume input will be correct. Never assume your function will receive some subset of input. A function should respond in a logical way no matter how bogus and messed up the inputs are.

The definition of 'redundant code' and 'YAGNI' often depend I find on how far you look ahead.

If you experience a problem, then you tend to write future code in such a way as to avoid that problem. Another programmer who hadn't experienced that particular issue might consider your code redundant overkill.

My suggestion is to keep track of how much time you spend on 'stuff that hasnt erred yet' if its loads and you peers are banging out features faster than you, then decrease it.

However, if you are like me, I expect you will have just typed it all out 'by default' and it hasn't really taken you any longer.

It is a good idea to document any assumptions about parameters. And it is a good idea to check that your client code does not violate those assumptions. I would do this:

/** ...
*   Precondition: rows is null or nonempty
*/
public static CalendarRow AssignAppointmentToRow(Appointment app, List<CalendarRow> rows)
{
    Assert( rows==null || rows.Count > 0 )
    //1. Is rows equal to null? - This will be the case if this is the first appointment.
    if (rows == null) {
        rows = new List<CalendarRow> ();
        rows.Add (new CalendarRow (0));
        rows [0].Appointments.Add (app);
    }

    //blah...
}

[Assuming this is C#, Assert might not be the best tool for the job, as it is not compiled in released code. But that is a debate for another day.]

Why is this better than what you wrote? Your code makes sense if, in a future where your client has changed, when the client passes in an empty list, the right thing to do will be to add a first row and add app to its appointments. But how do you know that that will be the case? It is better to make fewer assumptions now about the future.

Estimate the cost of adding that code now. It will be relatively cheap because it's all fresh in your mind, so you will be able to do this quickly. Adding unit tests is necessary - there's nothing worse then using some method a year later, it doesn't work, and you figure out it was broken from the start and never actually worked.

Estimate the cost of adding that code when it is needed. It will be more expensive, because you have to come back to the code, remember it all, and it is a lot harder.

Estimate the probability that the additional code will actually be needed. Then do the maths.

On the other hand, code full with assumptions "X will never happen" is awful for debugging. If something doesn't work as intended, it means either a stupid mistake, or a wrong assumption. Your "X will never happen" is an assumption, and in the presence of a bug it is suspicious. Which forces the next developer to waste time on it. It's usually better to not rely on any such assumptions.

The primary question here is "what will happen if you do/don't"?

As others have pointed out, this kind of defensive programming is good, but it's also occasionally dangerous.

For example, if you supply a default value, then you're keeping the program up. But the program may now not be doing what you want it to do. For example, if it writes that blank array to a file, you may now have turned your bug from "crashes because I supplied null by accident" to "clears out the calendar rows because I supplied null by accident". (for example if you start removing stuff that doesn't appear in the list in that part that reads " //blah")

The key for me is Never corrupt data. Let me repeat that; NEVER. CORRUPT. DATA. If your program exceptions out, you get a bug report you can patch; if it writes bad data to a file that it will later, you have to sow the ground with salt.

All your "unnecessary" decisions should be made with that premise in mind.

What you're dealing with here is basically an interface. By adding the behaviour "when input is null, initialize input", you've effectively extended the method interface - now instead of always operating on a valid list, you've made it "fix up" the input. Whether this is the official or unofficial part of the interface, you can bet somebody (most likely including you) is going to use this behaviour.

Interfaces should be kept simple, and they should be relatively stable - especially in something like a public static method. You get a bit of leeway in private methods, especially private instance methods. By implicitly extending the interface, you've made your code more complex in practice. Now, imagine that you don't actually want to use that code path - so you avoid it. Now you've got a bit of untested code that's pretending like it's a part of the method's behaviour. And I can tell you right now that it probably has a bug: when you pass a list, that list is mutated by the method. However, if you don't, you create a local list, and throw it away later. This is the kind of inconsistent behaviour that will make you cry in half a year, as you try to track down an obscure bug.

In general, defensive programming is a pretty useful thing. But the code paths for the defensive checks must be tested, just like any other code. In a case like this, they are complicating your code with no reason, and I'd opt for an alternative like this instead:

if (rows == null) throw new ArgumentNullException(nameof(rows));

You don't want an input where rows are null, and you want to make the error obvious to all your callers as soon as possible.

There's many values you need to juggle around when developing software. Even robustness itself is a very complex quality - for example, I wouldn't consider your defensive check more robust than throwing an exception. Exceptions are pretty handy for giving you a safe place to try again from a safe place - data corruption issues are usually much harder to track than recognizing a problem early on and handling it safely. In the end, they only tend to give you an illusion of robustness, and then a month later you notice that a tenth of your appointments are gone, because you never noticed that a different list got updated. Ouch.

Make sure to distinguish between the two. Defensive programming is a useful technique to catch errors at a place where they are the most relevant, significantly helping your debugging efforts, and with good exception handling, preventing "sneaky corruption". Fail early, fail fast. On the other hand, what you're doing is more like "error hiding" - you're juggling inputs and making assumptions about what the caller meant. This is very important for user-facing code (e.g. spell-checking), but you should be wary when seeing this with developer-facing code.

The main problem is that whatever abstraction you make, it's going to leak ("I wanted to type ofre, not fore! Stupid spell-checker!"), and the code to handle all the special cases and fix-ups is still code you need to maintain and understand, and code you need to test. Compare the effort of making sure a non-null list is passed with fixing the bug you have a year later in production - it's not a good trade-off. In an ideal world, you'd want every method to work exclusively with its own input, returning a result and modifying no global state. Of course, in the real world, you'll find plenty of cases where that's not the simplest and clearest solution (e.g. when saving a file), but I find that keeping methods "pure" when there's no reason for them to read or manipulate global state makes code much easier to reason about. It also tends to give you more natural points for splitting your methods :)

This doesn't mean that everything unexpected should make your application crash, on the contrary. If you use exceptions well, they naturally form safe error handling points, where you can restore stable application state and allow the user to continue what they're doing (ideally while avoiding any data loss for the user). At those handling points, you'll see opportunities for fixing the issues ("No order number 2212 found. Did you mean 2212b?") or giving the user control ("Error connecting to database. Try again?"). Even when no such option is available, at least it will give you a chance that nothing got corrupted - I've started appreciating code that uses using and try...finally a lot more than try...catch, it gives you a lot of opportunities to maintain invariants even under exceptional conditions.

Users shouldn't lose their data and work. This still has to be balanced with development costs etc., but it's a pretty good general guideline (if the users decide whether to buy your software or not - internal software usually doesn't have that luxury). Even the whole application crashing becomes a lot less of a problem if the user can just restart and go back to what they were doing. This is real robustness - Word saving your work all the time without corrupting your document on disk, and giving you an option to restore those changes after restarting Word after a crash. Is it better than a no bug in the first place? Probably not - though don't forget that work spent catching a rare bug might be better spent everywhere. But it's much better than the alternatives - e.g. a corrupted document on disk, all the work since last save lost, document auto-replaced with changes before crash which just happened to be Ctrl+A and Delete.

I'm going to answer this based on your assumption that robust code will benefit you "years" from now. If long-term benefits are your goal, I would prioritize design and maintainability over robustness.

The trade-off between design and robustness, is time and focus. Most developers would rather have a set of well designed code even if it means going through some trouble spots and doing some additional condition or error handling. After a few years of use, the places you really need it have probably been identified by users.

Assuming the design is about the same quality, less code is easier to maintain. This doesn't mean we're better off if you let known problems go for a few years, but adding things you know you don't need makes it difficult. We've all looked at legacy code and found unnecessary parts. You have to have a high-level of confidence changing code that has worked for years.

So if you feel your app is designed as well as can be, is easy to maintain and doesn't have any bugs, find something better to do than add code you don't need. It's the least you can do out of respect for all the other developers working long-hours on pointless features.

No you should not. And you are actually answering your own question when you state that this way of coding may hide bugs. This will not make the code more robust - rather it will make it more prone to bugs and make debugging harder.

You state your current expectations about the rows argument: Either it is null or otherwise it has at least one item. So the question is: Is it a good idea to write the code to additionally handle the unexpected third case where rows has zero items?

The answer is no. You should always throw an exception in the case of unexpected input. Consider this: If some other parts of the code breaks the expectation (i.e. the contract) of you method it means there is a bug. If there is a bug you want to know it as early as possible so you can fix it, and an exception will help you do that.

What the code currently does is guessing how to recover from a bug which may or may not exist in the code. But even if there is a bug, you cant know how to fully recover from it. Bugs by definition have unknown consequences. Maybe some initialization code did not run as expected at it may have many other consequences than just the missing row.

So you code should look like this:

public static CalendarRow AssignAppointmentToRow(Appointment app, List<CalendarRow> rows)
{
    if (rows != null && rows.Count == 0) throw new ArgumentException("Rows is empty."); 

    //1. Is rows equal to null? - This will be the case if this is the first appointment.
    if (rows == null) {
        rows = new List<CalendarRow> ();
        rows.Add (new CalendarRow (0));
        rows [0].Appointments.Add (app);
    }

    //blah...
}

Note: There are some specific cases where it makes sense to "guess" how to handle invalid input rather than just throwing an exception. For example if you handle external input you have no control over. Web browsers are an infamous example because they try to gracefully handle any kind of malformed and invalid input. But this only makes sense with external input, not with calls from other parts of the program.

Edit: Some other answers state that you are doing defensive programming. I disagree. Defensive programming means you don't automatically trust input to be valid. So validation of parameters (as above) is a defensive programming technique, but it does not mean you should alter unexpected or invalid input by guessing. The robust defensive approach is to validate input and then throw an exception in case of unexpected or invalid input.

Should I add redundant code now just in case it may be needed in the future?

You should not add redundant code at any time.

You should not add code that is only needed in the future.

You should make sure that your code behaves well no matter what happens.

The definition of "behave well" is up to your needs. One technique I like to use is "paranoia" exceptions. If I am 100% sure that a certain case can never happen, I still program an exception, but I do it in a way that a) clearly tells everyone that I never expect this to happen, and b) is clearly displayed and logged and thus does not lead to creeping corruption later on.

Example pseudo-code:

file = File.open(">", "bla")  or raise "Paranoia: cannot open file 'bla'"

file.write("xytz") or raise "Paranoia: disk full?"

file.close()  or raise "Paranoia: huh?!?!?"

This clearly communicates that I am 100% sure that I can always open, write or close the file, i.e., I do not go to the extents of creating an elaborate error handling. But if (no: when) I cannot open the file, my program will still fail in a controlled way.

The user interface will of course not display such messages to the user, they will be logged internally together with the stack trace. Again, these are internal "Paranoia" exceptions which just make sure that the code "stops" when something unexpected happens. This example is a bit contrieved, in practice I would of course implement real error handling for errors while opening files, as this happens regularly (wrong file name, USB stick mounted read-only, whatever).

A very important related search term would be "fail fast", as noted in other answers and is a very useful to create robust software.

There are lots of over-complicated answers here. You probably asked this question cause you didn't feel right about that piece code but wasn't sure why or how to fix it. So my answer is that the problem is very likely in the code structure (as always).

First, the method header:

public static CalendarRow AssignAppointmentToRow(Appointment app, List<CalendarRow> rows)

Assign appointment to what row? That should be clear immediately from the parameter list. Without any further knowledge, I would expect the method params to look like this: (Appointment app, CalendarRow row).

Next, the "input checks":

//1. Is rows equal to null? - This will be the case if this is the first appointment.
if (rows == null) {
    rows = new List<CalendarRow> ();
}

//2. Is rows empty? - This will be the case if this is the first appointment / some other unknown reason.
if(rows.Count == 0)
{
    rows.Add (new CalendarRow (0));
    rows [0].Appointments.Add (app);
}

This is bullshit.

check) The method caller should just make sure it does not pass uninitialized values inside the method. It's a programmer's responsiblity (to try) not to be stupid.
check) If I do not take into account that passing rows into the method is probably wrong (see the comment above), then it should not be responsibility of the method called AssignAppointmentToRow to manipulate rows in any other way than assigning the appointment somewhere.

But the whole concept of assigning appointments somewhere is strange (unless this is a GUI part of the code). Your code seems to contain (or at least it attempts to) an explicit data structure representing a calendar (that is List<CalendarRows> <- that should be defined as Calendar somewhere if you want to go this way, then you would be passing Calendar calendar to your method). If you go this way, I would expect the calendar to be prefilled with slots where you place (assign) the appointments afterwards (e.g. calendar[month][day] = appointment would be the appropriate code). But then you can also choose to ditch the calendar structure from the main logic altogether and just have List<Appointment> where the Appointment objects contain attribute date. And then if you need to render a calendar somewhere in the GUI, you can create this 'explicit calendar' structure just before rendering.

I don't know the details of your app so some of this is perhaps not applicable to you but both of these checks (mainly the second one) tell me that something somewhere is wrong with separation of concerns in your code.

For simplicity, let's assume that either you will eventually need this piece of code in N days (not later or earlier) or not need at all.

Pseudocode:

let C_now   = cost of implementing the piece of code now
let C_later = ... later
let C_maint = cost of maintaining the piece of code one day
              (note that it can be negative)
let P_need  = probability that you will need the code after N days

if C_now + C_maint * N < P_need*C_later then implement the code else omit it.

Factors for C_maint:

Does it improve the code in general, making it more self-documenting, easier to test? If yes, negative C_maint expected
Does it make code bigger (hence harder to read, longer to compile, implement tests, etc)?
Are any refactorings/redesigns pending? If yes, bump C_maint. This case requires more complex formula, with more times variables like N.

So big thing that just weights down the code and might be needed only in 2 years with low probability should be left out, but a little thing that also prodices useful assertions and 50% that it will be needed in 3 months, it should be implemented.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange