Does TDD make defensive programming redundant?

https://softwareengineering.stackexchange.com/questions/331864

29-12-2020
|

Question

Today I had an interesting discussion with a colleague.

I am a defensive programmer. I believe that the rule "a class must ensure that its objects have a valid state when interacted with from outside the class" must always be adhered to. The reason for this rule is that the class does not know who its users are and that it should predictably fail when it is interacted with in an illegal manner. In my opinion that rule applies to all classes.

In the specific situation where I had a discussion today, I wrote code which validates that the arguments to my constructor are correct (e.g. an integer parameter must be > 0) and if the precondition is not met, then an exception gets thrown. My colleague on the other hand believes that such a check is redundant, because unit tests should catch any incorrect uses of the class. Additionally he believes that defensive programming validations should also be unit tested, so defensive programming adds much work and is therefore not optimal for TDD.

Is it true that TDD is able to replace defensive programming? Is parameter validation (and I don't mean user input) unnecessary as a consequence? Or do the two techniques complement each other?

Solution

That's ridiculous. TDD forces code to pass tests and forces all code to have some tests around it. It doesn't prevent your consumers from incorrectly calling code, nor does it magically prevent programmers missing test cases.

No methodology can force users to use code correctly.

There is a slight argument to be made that if you perfectly did TDD you would have caught your > 0 check in a test case, prior to implementing it, and addressed this -- probably by you adding the check. But if you did TDD, your requirement (> 0 in constructor) would first appear as a testcase that fails. Thus giving you the test after you add your check.

It is also reasonable to test some of the defensive conditions (you added logic, why wouldn't you want to test something so easily testable?). I'm not sure why you seem to disagree with this.

Or do the two techniques complement each other?

TDD will develop the tests. Implementing parameter validation will make them pass.

OTHER TIPS

Defensive programming and unit tests are two different ways of catching errors and each have different strengths. Using only one way of detecting errors makes your error detection mechanisms fragile. Using both will catch errors that might have been missed by one or the other, even in code that isn't a public facing API; for example, someone may have forgotten to add a unit test for invalid data passed into the public API. Checking everything at appropriate places means more chances to catch the error.

In information security, this is called Defense In Depth. Having multiple layers of defense ensures that if one fails, there's still others to catch it.

Your colleague is right about one thing: You should test your validations, but this isn't "unnecessary work". It is the same as testing any other code, you want to make sure all usages, even invalid ones, have an expected outcome.

TDD does absolutely not replace defensive programming. Instead, you can use TDD to make sure all defences are in place and work as expected.

In TDD, you are not supposed to write code without writing a test first – follow the red–green–refactor cycle religiously. That means that if you want to add validation, first write a test first that requires this validation. Call the method in question with negative numbers and with zero, and expect it to throw an exception.

Also, do not forget the “refactor” step. While TDD is test-driven, this does not mean test-only. You should still apply proper design, and write sensible code. Writing defensive code is sensible code, because it makes expectations more explicit and your code overall more robust – spotting possible errors early makes them easier to debug.

But aren't we supposed to use tests to locate errors? Assertions and tests are complementary. A good test strategy will mix various approaches to make sure the software is robust. Only unit testing or only integration testing or only assertions in the code are all unsatisfactory, you need a good combination to reach a sufficient degree of confidence in your software with acceptable effort.

Then there is a very big conceptual misunderstanding of your co-worker: Unit tests can never test uses of your class, only that the class itself works as expected in isolation. You would use integration tests to check that the interaction between various component works, but the combinatorial explosion of possible test cases makes it impossible to test everything. Integration tests should therefore restrict themselves to a couple important cases. More detailed tests that also cover edge cases and error cases are more suitable for unit tests.

Tests are there to support and ensure defensive programming

Defensive programming protects the integrity of the system at runtime.

Tests are (mostly static) diagnostic tools. At runtime, your tests are nowhere in sight. They're like scaffolding used to put up a high brick wall or a rock dome. You don't leave important parts out of the structure because you have a scaffolding holding it up during construction. You have a scaffolding holding it up during construction to facilitate putting all the important pieces in.

EDIT: An analogy

What about an analogy to comments in code?

Comments have their purpose, but can be redundant or even harmful. For example, if you put intrinsic knowledge about the code into the comments, then change the code, the comments become irrelevant at best and harmful at worst.

So say you put a lot of intrinsic knowledge of your code base into the tests, such as MethodA can't take a null and MethodB's argument has to be > 0. Then the code changes. Null is okay for A now, and B can take values as small as -10. The existing tests are now functionally wrong, but will continue to pass.

Yes, you should be updating the tests at the same time you update the code. You should also be updating (or removing) comments at the same time you update the code. But we all know these things don't always happen, and that mistakes are made.

The tests verify the behavior of the system. That actual behavior is intrinsic to the system itself, not intrinsic to the tests.

What could possibly go wrong?

The goal with regard to the tests is to think up everything that could go wrong, write a test for it that checks for the right behavior, then craft the runtime code so it passes all of the tests.

Which means that defensive programming is the point.

TDD drives defensive programming, if the tests are comprehensive.

More tests, driving more defensive programming

When bugs are inevitably found, more tests are written to model the conditions that manifest the bug. Then the code is fixed, with code to make those tests pass, and the new tests remain in the test suite.

A good set of tests is going to pass both good and bad arguments to a function/method, and expect consistent results. This, in turn, means the tested component is going to use precondition checks (defensive programming) to confirm the arguments passed to it.

Generically speaking...

For example, if a null argument to a particular procedure is invalid, then at least one test is going to pass a null, and it's going to expect an "invalid null argument" exception/error of some kind.

At least one other test is going to pass a valid argument, of course--or loop through a big array and pass umpteen valid arguments--and confirm that the resulting state is appropriate.

If a test doesn't pass that null argument and get slapped with the expected exception (and that exception was thrown because the code defensively checked the state passed to it), then the null may end up assigned to a property of a class or buried in a collection of some kind where it shouldn't be.

This might cause unexpected behavior in some entirely different part of the system that the class instance gets passed to, in some distant geographic locale after the software has shipped. And that's the sort of thing we're actually trying to avoid, right?

It could even be worse. The class instance with the invalid state could be serialized and stored, only to cause a failure when it is reconstituted for used later on. Geez, I don't know, maybe it's a mechanical control system of some kind that can't restart after a shutdown because it can't deserialize its own persistent configuration state. Or the class instance could be serialized and passed to some entirely different system created by some other entity, and that system might crash.

Especially if that other system's programmers didn't code defensively.

Instead of TDD let's talk about "software testing" in general, and instead of "defensive programming" in general, let's talk about my favorite way of doing defensive programming, which is by using assertions.

So, since we do software testing, we should quit placing assert statements in production code, right? Let me count the ways in which this is wrong:

Assertions are optional, so if you don't like them, just run your system with assertions disabled.
Assertions check things that testing cannot (and should not.) Because testing is supposed to have a black-box view of your system, while assertions have a white-box view. (Of course, since they live in it.)
Assertions are an excellent documentation tool. No comment ever was, or will ever be, as unambiguous as a piece of code asserting the same thing. Also, documentation tends to become outdated as the code evolves, and it is not in any way enforceable by the compiler.
Assertions can catch errors in the testing code. Have you ever run into a situation where a test fails, and you don't know who's wrong --the production code, or the test?
Assertions can be more pertinent than testing. The tests will check for what is prescribed by the functional requirements, but the code often has to make certain assumptions that are far more technical than that. People who write functional requirements documents rarely think of division by zero.
Assertions pinpoint errors that testing only broadly hints at. So, your test sets up some extensive preconditions, invokes some lengthy piece of code, gathers the results, and finds that they are not as expected. Given enough troubleshooting you will eventually find exactly where things went wrong, but assertions will usually find it first.
Assertions reduce program complexity. Every single line of code that you write increases program complexity. Assertions and the final (readonly) keyword are the only two constructs that I know of that actually reduce program complexity. That's priceless.
Assertions help the compiler make better sense of your code. Please do try this at home: void foo( Object x ) { assert x != null; if( x == null ) { } } your compiler should issue a warning telling you that the condition x == null is always false. That can be very useful.

The above was a summary of a post from my blog, 2014-09-21 "Assertions and Testing"

I believe most answers are missing a critical distinction: It depends on how your code is going to be used.

Is the module in question going to be used by other clients independet of the application you are testing? If you are providing a library or API for use by third-parties, you have no way of ensuring they only call your code with valid input. You have to validate all input.

But if the module in question is only used by code you control, then your friend may have a point. You can use unit tests to verify that the module in question is only called with valid input. Precondition checks could still be considered a good practice, but it is a trade-off: I you litter the code which checks for condition which you know can never arise, it just obscures the intent of the code.

I disagree that precondition checks require more unit-tests. If you decide you dont need to test some forms of invalid input, then it shouldn't matter if the function contains precondition checks or not. Remember tests should verify behavior, not implementation details.

This argument sort of baffles me, because when I started practicing TDD, my unit tests of the form "object responds <certain way> when <invalid input>" increased 2 or 3 times. I'm wondering how your colleague is managing to successfully pass those kinds of unit tests without his functions doing validation.

The converse case, that unit tests show you are never producing bad outputs that will be passed to other functions' arguments, is much more difficult to prove. Like the first case, it depends heavily on thorough coverage of edge cases, but you have the additional requirement that all your function inputs must come from the outputs of other functions whose outputs you've unit tested and not from, say, user input or third party modules.

In other words, what TDD does isn't prevent you from needing validation code as much as help you avoid forgetting it.

I think I interpret your colleague's remarks differently from most of the rest of the answers.

It seems to me that the argument is:

All of our code is unit tested.
All the code that uses your component is our code, or if not is unit tested by someone else (not explicitly stated, but it's what I understand from "unit tests should catch any incorrect uses of the class").
Therefore, for each caller of your function there is a unit test somewhere that mocks your component, and the test fails if the caller passes an invalid value to that mock.
Therefore, it doesn't matter what your function does when passed an invalid value, because our tests say that can't happen.

To me, this argument has some logic to it, but places too much reliance on unit tests to cover every possible situation. The simple fact is that 100% line/branch/path coverage doesn't necessarily exercise every value that the caller might pass in, whereas 100% coverage of all possible states of the caller (that is to say, all possible values of its inputs and variables) is computationally infeasible.

Therefore I would tend to prefer to unit-test the callers to ensure that (so far as the tests go) they never pass in bad values, and additionally to require that your component fails in some recognisable way when a bad value is passed in (at least as far as it's possible to recognise bad values in your language of choice). This will aid debugging when problems occur in integration tests, and will likewise aid any users of your class who are less than rigorous in isolating their code unit from that dependency.

Do be aware though, that if you document and test the behaviour of your function when a value <= 0 is passed in, then negative values are no longer invalid (at least, no more invalid than is any argument at all to throw, since that too is documented to throw an exception!). Callers are entitled to rely on that defensive behaviour. Language permitting, it may be that this is in any case the best scenario -- the function has no "invalid inputs", but callers who expect not to provoke the function into throwing an exception should be unit-tested sufficiently to ensure they don't pass any values which cause that.

Despite thinking that your colleague is somewhat less thoroughly wrong than most answers, I reach the same conclusion, which is that the two techniques complement each other. Program defensively, document your defensive checks, and test them. The work is only "unnecessary" if users of your code cannot benefit from useful error messages when they make mistakes. In theory if they thoroughly unit test all their code before integrating it with yours, and there are never any errors in their tests, then they'll never see the error messages. In practice even if they are doing TDD and total dependency injection, they still might explore during development or there might be a lapse in their testing. The result is they call your code before their code is perfect!

Public interfaces can and will be misused

The claim of your coworker "unit tests should catch any incorrect uses of the class" is strictly false for any interface that's not private. If a public function can be called with integer arguments, then it can and will be called with any integer arguments, and the code should behave appropriately. If a public function signature accepts e.g. Java Double type, then null, NaN, MAX_VALUE, -Inf are all possible values. Your unit tests cannot catch incorrect uses of the class because those tests cannot test the code that will use this class, because that code isn't written yet, might not be written by you, and will definitely be outside the scope of your unit tests.

On the other hand, this approach may be valid for the (hopefully much more numerous) private properties - if a class can ensure that some fact always is true (e.g. property X cannot ever be null, integer position doesn't exceed the maximum length, when function A is called, all the prerequisite data structures are well formed) then it can be appropriate to avoid verifying this again and again for performance reasons, and rely on unit tests instead.

Defense against misuse is a feature, developed due to a requirement for it. (Not all interfaces require rigorous checks against misuse; for instance very narrowly used internal ones.)

The feature requires testing: does the defense against misuse actually work? The goal of testing this feature is to try to show that it doesn't: to contrive some mis-use of the module which isn't caught by its checks.

If specific checks are a required feature, it is indeed nonsensical to assert that the existence of some tests makes them unnecessary. If it is a feature of some function that (say) it throws an exception when parameter three is negative, then that is not negotiable; it shall do that.

However, I suspect that your colleague is actually making sense from the point of view of a situation in which there is no requirement for specific checks on inputs, with specific responses to bad inputs: a situation in which there is only an understood general requirement for robustness.

Checks on entry into some top level function are there, in part, to protect some weak or badly tested internal code from unexpected combinations of parameters (such that if the code is well-tested, the checks are not necessary: the code can just "weather" the bad parameters).

There truth in the colleague's idea, and what he likely means is this: if we build a function out of very robust lower-level pieces which are defensively coded and individually tested against all misuse, then it is possible for the higher level function to be robust without having its own extensive self-checks.

If its contract is violated, then it will translate to some misuse of the lower-level functions, perhaps by throwing exceptions or whatever.

The only issue with that is that the lower level exceptions are not specific to the higher level interface. Whether that is a problem depends on what the requirements are. If the requirement is simply "the function shall be robust against misuse and throw some sort of exception rather than crash, or continue calculating with garbage data" then in fact it might be covered by all the robustness of the lower level pieces on which it is built.

If the function it has a requirement for very specific, detailed error reporting related to its parameters, then the lower level checks do not satisfy those requirements fully. They ensure only that the function blows up somehow (does not continue with a bad combination of parameters, producing a garbage result). If the client code is written to specifically catch certain errors and handle them, it may not work correctly. The client code might itself be obtaining, as an input, the data on which the parameters are based, and it may be expecting the function to check these and to to translate bad values to the specific errors as documented (so that it can handle those errors properly) rather than some other errors which are not handled and perhaps stop the software image.

TL; DR: your colleague is probably not an idiot; you are just talking past each other with different perspectives around the same thing, because the requirements are not fully nailed down and each of you has a different idea of what the "unwritten requirements" are. You think that when there are no specific requirements on parameter checking, you should code up detailed checking anyway; the colleague thinks, just let the robust lower level code blow up when the parameters are wrong. It is somewhat unproductive to argue about unwritten requirements through the code: recognize that you disagree about requirements rather than code. Your way of coding reflects what you think the requirements are; the colleague's way represents his view of the requirements. If you see it that way, it's clear that what is right or wrong isn't in the code itself; the code is just a proxy for your opinion of what the specification should be.

Tests define the contract of you class.

As a corollary, the absence of a test defines a contract that includes undefined behaviour. So when you pass null to Foo::Frobnicate(Widget widget), and untold run-time havoc ensues, you are still within the contract of your class.

Later you decide, "we don't want the possibility of undefined behaviour", which is a sensible choice. That means that you have to have an expected behaviour for passing null to Foo::Frobnicate(Widget widget).

And you document that decision by including a

[Test]
void Foo_FrobnicatesANullWidget_ThrowsInvalidArgument() 
{
    Given(Foo foo);
    When(foo.Frobnicate(null));
    Then(Expect_Exception(InvalidArgument));
}

A good set of tests will exercise the external interface of your class and ensure that such misuses generate the correct response (an exception, or whatever you define as "correct"). In fact, the first test case that I write for a class is to call its constructor with out-of-range arguments.

The kind of defensive programming that tends to be eliminated by a fully unit-tested approach is the unnecessary validation of internal invariants that can't be violated by external code.

A useful idea I sometimes employ is to provide a method which tests the object's invariants; your tear-down method can call it to validate that your external actions on the object never break the invariants.

The tests of TDD will catch mistakes during the development of the code.

The bounds checking you describe as part of defensive programming will catch mistakes during the use of the code.

If the two domains are the same, that is the code you are writing is only ever used internally by this specific project, then it may be true that TDD will preclude the necessity of the defensive programming bounds checking you describe, but only if those types of bounds checking are specifically performed in TDD tests.

As a specific example, suppose that a library of financial code was developed using TDD. One of the tests might assert that a particular value can never be negative. That ensures that the developers of the library don't accidentally misuse the classes as they implement the features.

But after the library is released and I'm using it in my own program, those TDD tests don't prevent me from assigning a negative value (assuming it's exposed). Bounds checking would.

My point is that while a TDD assert could address the negative value problem if the code is only ever used internally as part of the development of a larger application (under TDD), if it's going to be a library used by other programmers without the TDD framework and tests, bounds checking matters.

TDD and defensive programming go hand and hand. Using both is not redundant, but in fact complementary. When you have a function, you want to make sure that the function works as described and write tests for it; if you do not cover what happens when in the case of a bad input, bad return, bad state, etc. then you aren't writing your tests robustly enough, and your code will be fragile even if all your tests are passing.

As an embedded engineer, I like to use the example of writing a function to simply add two bytes together and return the result like this:

uint8_t AddTwoBytes(uint8_t a, uint8_t b, uint8_t *sum);

Now if you simply just did *(sum) = a + b it would work, but only with some inputs. a = 1 and b = 2 would make sum = 3; however because the size of sum is a byte, a = 100 and b = 200 would make sum = 44 due to overflow. In C, you would return error in this case to signify the function failed; throwing an exception is the same thing in your code. Not considering the fails or testing how to handle them will not work long term, because if those conditions occur, they won't be handled and could cause any number of issues.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange