Should unit-tests be entirely self-contained? [duplicate]

https://softwareengineering.stackexchange.com/questions/226357

02-10-2020
|

Question

As the title suggests my question is whether or not unit-tests should be entirely self-contained or can one rely on the results yielded by previous tests?

What I mean, in case that it isn't entirely clear, is that if ones initial test sufficiently asserts that a certain module A works in a certain way can, or more appropriately, should one write subsequent tests with the assumption that the aforementioned module A is tested beforehand? This would imply that the order that the unit-tests are executed matters.

Or should each individual test be self-reliant to the point that if one needs to know whether a module B works, which can only be validated if module A works, should one then test module A within the same test as module B which would imply that the separate unit-tests can be executed in any order.

To give a concrete example, consider the "stack" datatype, which we won't delve too deeply into, specifically two fundamental properties we'd need to be able to reason about the datatype in any meaningful sense. Namely, isEmpty(stack) and Empty(). Now, if one wishes to test the validity of isEmpty which takes a stack and returns a True or False depending on whether or not the stack received as an argument is empty or not one would first need to create an empty stack using Empty().

Then we consider the scenario of doing the following: isEmpty(Empty()) and checking what kind of result we get. Either we get a True statement, so Empty() could have returned a non empty stack and isEmpty might view that as an empty stack which would be wrong. Or we get False, and we still don't know how the two actually interplay, nor how either works on their own (assuming we can't view the source). (There is a third option, we receive something that's neither True nor False, but that is beyond the scope of this discussion, this is solely a remark).

Finally, to tie this back to my question, if we create a test where we can be reasonably certain that isEmpty works in a satisfying manner can then all tests executed after it trust that it does indeed work? Or should they all try and incorporate this ambiguity into their own test logic (for instance, including an else statement to if neither True nor False was returned)

Solution

Each unit test should be testing one thing, so yes, they should assume that all other parts of the system is working. The way to do this reliably is to mock any other code which should not be tested at the same time.

For example, consider this pseudo-code:

house_json(id):
    name = get_house_name(id)
    return json.format({'id': id, 'name': name})

This depends on the functionality of two other functions: get_house_name and json.format. To unit test this you'll have to mock both of them. First the normal case, testing that with a valid house ID we end up calling json.format with the expected parameters:

test_house_json_format():
    get_house_name = mock()
    json.format = mock()
    get_house_name.return_for(5) = 'foo'
    house_json(5)
    assert_called_once_with(json.format, {'id': 5, 'name': 'foo'})

Then to test that if we send an invalid house ID, the formatter should throw an exception:

test_house_json_with_invalid_house_id():
    get_house_name = mock()
    json.format = mock()
    get_house_name.return(x) = lambda: raise InvalidHouseError(x)
    assert_raises(house_json, 5, InvalidHouseError)
    assert_not_called(json.format)

Making expectations about the code explicit in your test code means it will be easy for others to understand its expected behaviour, and to change it when requirements change. For example, if the function should handle thrown errors and return an error code in JSON format instead, you'd change that in this test:

test_house_json_with_invalid_house_id():
    get_house_name, json.format = mock()
    get_house_name.return(x) = lambda: raise InvalidHouseError(x)
    house_json(5)
    assert_called_once_with(json.format, {'invalid_house_error': 5})

Then you'd run the test to verify that it fails, fix get_house_name to make it pass again, and refactor to make it the simplest possible code which passes (red-green-refactoring).

Once you've tested all the parts, you should add integration and acceptance tests without mocking to ensure that they all work together.

OTHER TIPS

First, it is a very good practice to make sure your automatic tests are independend from the order of execution (for example, when one test fails, you want to run exactly that test in your debugger without having 10 other tests to be run before).

But you asked something different:

is that if ones initial test sufficiently asserts that a certain module A works in a certain way [], should one write subsequent tests with the assumption that the aforementioned module A is tested beforehand

Writing a test for a function X which relies on the correctness of function Y does not make your tests order-dependend. If your function X is wrong, your tests for Y may fail (as well as your tests for function Y), and if they fail, they fail always if you run the tests for function X ,or if you don't run them, it does not matter.

I would argue that you emphatically should not make your individual tests be entirely self-contained. It is not typically useful to try to write tests on Module B which will work even if Module A is buggy. In fact, I think it is often dangerous and unwise to attempt to do so.

A strict unit test will mock out all of the other classes besides the one being tested. Thus the test will specify the inputs and outputs to all objects/modules other than the one being tested. I do not think there is great value in writing these strict unit tests. There are cases to mock out modules, but in general one should default to using the actual module.

Why?

1) By running the actual implementation of Module A while testing Module B, you get additional testing on Module A for free. Bugs which you didn't catch while testing Module A may come in Module B's use of it. You are throwing away valuable checks on the accuracy of Module A if you simply mock it.

2) Mocking out all of the external calls is obnoxious. You end up having to write lots of code in your test to specify the inputs and outputs of the various objects used. This is typically tedious and makes your tests harder to read and write.

3) If you mock out the calls to Module A from Module B, you are asserting that Module B made the calls you expected. You are not checking that these were the correct calls to make. For example, suppose you have a function like:

Foobar highestScoring() {
    // getFoobars() returns the Foobars sorted by score.
    return module_a.getFoobars().getFirst();
}

It seems pretty sensible, but what order does module_a sort them by? If it sorts lowest to highest, then this function is wrong. But if you mocked module_a, you'd have missed this because you assumed that it sorted the other way around. This is a bug in Module A that would have missed in try to isolate it from Module B.

What are the advantages of mocking every module? There are certainly advantages to mocking certain modules. You typically want to mock modules which slow, rapidly changing, or would cause side effects. But what does the practice of a strict unit test which mocks everything get you? The theory is that if Module A is buggy, then the tests will point to Module A, and not everything that Module A depends on.

I don't think this is very helpful in practice. If you've actually broken Module A, it is typically because you've modified Module A. So you probably already know module A is broken, because that's why were modifying.

Actually, if we really wanted that benefit, unit testing platforms could add it by annotating or inferring a dependency between tests. So if Module A's test fail, we wouldn't even both running Module B's tests.

So in short, systematically mocking every module in your test requires a lot of work, creates places for bugs to hide, and gives you marginal benefits. Just don't do it.

The goal of unit testing is to test all paths within a class. This means testing all possible inputs (or representative sample), but also all possible behaviors of dependencies, including dependency failures, i.e. how a method in module B handles all possible correct values returned by A, all possible exceptions thrown by A, and all possible incorrect return values from A, e.g. null. Testing module B by relying on a set of tests for A is valid so long as you recognize that it represents only a subset of behaviors of A (unless you can confirm that it is possible to exercise all paths in A using the range of inputs to B). To fully test B, you likely need to force the full range of potential return values/exceptions from A by mocking it. It doesn't matter from the perspective of B whether A is correct or not. What matters is how B handles any implementation of A, i.e. to protect against current or future buggy implementations of A (or A's dependencies). The goal is to prove that B always does the correct thing for correct inputs and correct dependency behaviors, and also does something sensible (not crash or corrupt anything, informs the caller of the problem, returns control gracefully) when it receives bad input or encounters faulty behavior by a dependency.

So my response is unit tests can use other unit tests to generate behavior of dependencies, but that a mockup of dependencies is likely needed to define a complete set of unit tests, e.g. to include dependency failures. The particular implementation of dependencies, and their correctness, are irrelevant, only their API matters.

To say it slightly differently, an exception thrown by A is a valid unit test of B in that it tests an error handling path in B, that is unreachable by varying input to B.

Unit tests should be self-contained, but need not be order-independent. It should be possible a test in isolation from others, but a failure may then imply failure in a dependency, not in the test itself.

Take a simplistic example, a set of functions implementing a data type. Two of those functions are Parse() and Format(), converting from and to a string representation. It is a quite reasonable strategy to test the Format() function, and then use the Format() function in tests on the Parse() function. If the tests are run out of order then a fault in the Format() function could actually appear as a test failure in the Parse() function.

This is simplistic, but it is common to build up suites of increasingly complex tests based on the knowledge that prior tests have passed. In some cases mocking is a better strategy, but mock components themselves have to be tested, so even then you are depending on prior successful tests.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange