Can invariant testing replace unit testing?

https://stackoverflow.com/questions/766867

12-09-2019
|

Question

As a programmer, I have bought whole-heartedly into the TDD philosophy and take the effort to make extensive unit tests for any nontrivial code I write. Sometimes this road can be painful (behavioral changes causing cascading multiple unit test changes; high amounts of scaffolding necessary), but on the whole I refuse to program without tests that I can run after every change, and my code is much less buggy as a result.

Recently, I've been playing with Haskell, and it's resident testing library, QuickCheck. In a fashion distinctly different from TDD, QuickCheck has an emphasis on testing invariants of the code, that is, certain properties that hold over all (or substantive subsets) of inputs. A quick example: a stable sorting algorithm should give the same answer if we run it twice, should have increasing output, should be a permutation of the input, etc. Then, QuickCheck generates a variety of random data in order to test these invariants.

It seems to me, at least for pure functions (that is, functions without side effects--and if you do mocking correctly you can convert dirty functions into pure ones), that invariant testing could supplant unit testing as a strict superset of those capabilities. Each unit test consists of an input and an output (in imperative programming languages, the "output" is not just the return of the function but also any changed state, but this can be encapsulated). One could conceivably created a random input generator that is good enough to cover all of the unit test inputs that you would have manually created (and then some, because it would it would generate cases that you wouldn't have thought of); if you find a bug in your program due to some boundary condition, you improve your random input generator so that it generates that case too.

The challenge, then, is whether or not it's possible to formulate useful invariants for every problem. I'd say it is: it's a lot simpler once you have an answer to see if it's correct than it is to calculate the answer in the first place. Thinking about invariants also helps clarify the specification of a complex algorithm much better than ad hoc test cases, which encourage a kind of case-by-case thinking of the problem. You could use a previous version of your program as a model implementation, or a version of a program in another language. Etc. Eventually, you could cover all of your former test-cases without having to explicitly code an input or an output.

Have I gone insane, or am I on to something?

Solution

A year later, I now think I have an answer to this question: No! In particular, unit tests will always be necessary and useful for regression tests, in which a test is attached to a bug report and lives on in the codebase to prevent that bug from ever coming back.

However, I suspect that any unit test can be replaced with a test whose inputs are randomly generated. Even in the case of imperative code, the “input” is the order of imperative statements you need to make. Of course, whether or not it’s worth creating the random data generator, and whether or not you can make the random data generator have the right distribution is another question. Unit testing is simply a degenerate case where the random generator always gives the same result.

OTHER TIPS

What you've brought up is a very good point - when only applied to functional programming. You stated a means of accomplishing this all with imperative code, but you also touched on why it's not done - it's not particularly easy.

I think that's the very reason it won't replace unit testing: it doesn't fit for imperative code as easily.

Doubtful

I've only heard of (not used) these kinds of tests, but I see two potential issues. I would love to have comments about each.

Misleading results

I've heard of tests like:

reverse(reverse(list)) should equal list
unzip(zip(data)) should equal data

It would be great to know that these hold true for a wide range of inputs. But both these tests would pass if the functions just return their input.

It seems to me that you'd want to verify that, eg, reverse([1 2 3]) equals [3 2 1] to prove correct behavior in at least one case, then add some testing with random data.

Test complexity

An invariant test that fully describes the relationship between the input and output might be more complex than the function itself. If it's complex, it could be buggy, but you don't have tests for your tests.

A good unit test, by contrast, is too simple to screw up or misunderstand as a reader. Only a typo could create a bug in "expect reverse([1 2 3]) to equal [3 2 1]".

What you wrote in your original post, reminded me of this problem, which is an open question as to what the loop invariant is to prove the loop correct...

anyways, i am not sure how much you have read in formal spec, but you are heading down that line of thought. david gries's book is one the classics on the subject, I still haven't mastered the concept well enough to use it rapidly in my day to day programming. the usual response to formal spec is, its hard and complicated, and only worth the effort if you are working on safety critical systems. but i think there are back of envelope techniques similar to what quickcheck exposes that can be used.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow