Random data in Unit Tests?

https://stackoverflow.com/questions/32458

09-06-2019
|

Question

I have a coworker who writes unit tests for objects which fill their fields with random data. His reason is that it gives a wider range of testing, since it will test a lot of different values, whereas a normal test only uses a single static value.

I've given him a number of different reasons against this, the main ones being:

random values means the test isn't truly repeatable (which also means that if the test can randomly fail, it can do so on the build server and break the build)
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was

Another coworker added:

If I am testing an exception, random values will not ensure that the test ends up in the expected state
random data is used for flushing out a system and load testing, not for unit tests

Can anyone else add additional reasons I can give him to get him to stop doing this?

(Or alternately, is this an acceptable method of writing unit tests, and I and my other coworker are wrong?)

Solution

There's a compromise. Your coworker is actually onto something, but I think he's doing it wrong. I'm not sure that totally random testing is very useful, but it's certainly not invalid.

A program (or unit) specification is a hypothesis that there exists some program that meets it. The program itself is then evidence of that hypothesis. What unit testing ought to be is an attempt to provide counter-evidence to refute that the program works according to the spec.

Now, you can write the unit tests by hand, but it really is a mechanical task. It can be automated. All you have to do is write the spec, and a machine can generate lots and lots of unit tests that try to break your code.

I don't know what language you're using, but see here:

Java http://functionaljava.org/

Scala (or Java) http://github.com/rickynils/scalacheck

Haskell http://www.cs.chalmers.se/~rjmh/QuickCheck/

.NET: http://blogs.msdn.com/dsyme/archive/2008/08/09/fscheck-0-2.aspx

These tools will take your well-formed spec as input and automatically generate as many unit tests as you want, with automatically generated data. They use "shrinking" strategies (which you can tweak) to find the simplest possible test case to break your code and to make sure it covers the edge cases well.

Happy testing!

OTHER TIPS

This kind of testing is called a Monkey test. When done right, it can smoke out bugs from the really dark corners.

To address your concerns about reproducibility: the right way to approach this, is to record the failed test entries, generate a unit test, which probes for the entire family of the specific bug; and include in the unit test the one specific input (from the random data) which caused the initial failure.

There is a half-way house here which has some use, which is to seed your PRNG with a constant. That allows you to generate 'random' data which is repeatable.

Personally I do think there are places where (constant) random data is useful in testing - after you think you've done all your carefully-thought-out corners, using stimuli from a PRNG can sometimes find other things.

In the book Beautiful Code, there is a chapter called "Beautiful Tests", where he goes through a testing strategy for the Binary Search algorithm. One paragraph is called "Random Acts of Testing", in which he creates random arrays to thoroughly test the algorithm. You can read some of this online at Google Books, page 95, but it's a great book worth having.

So basically this just shows that generating random data for testing is a viable option.

One advantage for someone looking at the tests is that arbitrary data is clearly not important. I've seen too many tests that involved dozens of pieces of data and it can be difficult to tell what needs to be that way and what just happens to be that way. E.g. If an address validation method is tested with a specific zip code and all other data is random then you can be pretty sure the zip code is the only important part.

If you are doing TDD then I would argue that random data is an excellent approach. If your test is written with constants, then you can only guarantee your code works for the specific value. If your test is randomly failing the build server there is likely a problem with how the test was written.

Random data will help ensure any future refactoring will not rely on a magic constant. After all, if your tests are your documentation, then doesn't the presence of constants imply it only needs to work for those constants?

I am exaggerating however I prefer to inject random data into my test as a sign that "the value of this variable should not affect the outcome of this test".

I will say though that if you use a random variable then fork your test based on that variable, then that is a smell.

if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was

If your test case does not accurately record what it is testing, perhaps you need to recode the test case. I always want to have logs that I can refer back to for test cases so that I know exactly what caused it to fail whether using static or random data.

Your co-worker is doing fuzz-testing, although he doesn't know about it. They are especially valuable in server systems.

I am in favor of random tests, and I write them. However, whether they are appropriate in a particular build environment and which test suites they should be included in is a more nuanced question.

Run locally (e.g., overnight on your dev box) randomized tests have found bugs both obvious and obscure. The obscure ones are arcane enough that I think random testing was really the only realistic one to flush them out. As a test, I took one tough-to-find bug discovered via randomized testing and had a half dozen crack developers review the function (about a dozen lines of code) where it occurred. None were able to detect it.

Many of your arguments against randomized data are flavors of "the test isn't reproducible". However, a well written randomized test will capture the seed used to start the randomized seed and output it on failure. In addition to allowing you to repeat the test by hand, this allows you to trivially create new test which test the specific issue by hardcoding the seed for that test. Of course, it's probably nicer to hand-code an explicit test covering that case, but laziness has its virtues, and this even allows you to essentially auto-generate new test cases from a failing seed.

The one point you make that I can't debate, however, is that it breaks the build systems. Most build and continuous integration tests expect the tests to do the same thing, every time. So a test that randomly fails will create chaos, randomly failing and pointing the fingers at changes that were harmless.

A solution then, is to still run your randomized tests as part of the build and CI tests, but run it with a fixed seed, for a fixed number of iterations. Hence the test always does the same thing, but still explores a bunch of the input space (if you run it for multiple iterations).

Locally, e.g., when changing the concerned class, you are free to run it for more iterations or with other seeds. If randomized testing ever becomes more popular, you could even imagine a specific suite of tests which are known to be random, which could be run with different seeds (hence with increasing coverage over time), and where failures wouldn't mean the same thing as deterministic CI systems (i.e., runs aren't associated 1:1 with code changes and so you don't point a finger at a particular change when things fail).

There is a lot to be said for randomized tests, especially well written ones, so don't be too quick to dismiss them!

Can you generate some random data once (I mean exactly once, not once per test run), then use it in all tests thereafter?

I can definitely see the value in creating random data to test those cases that you haven't thought of, but you're right, having unit tests that can randomly pass or fail is a bad thing.

You should ask yourselves what is the goal of your test.
Unit tests are about verifying logic, code flow and object interactions. Using random values tries to achieve a different goal, thus reduces test focus and simplicity. It is acceptable for readability reasons (generating UUID, ids, keys,etc.).
Specifically for unit tests, I cannot recall even once this method was successful finding problems. But I have seen many determinism problems (in the tests) trying to be clever with random values and mainly with random dates.
Fuzz testing is a valid approach for integration tests and end-to-end tests.

If you're using random input for your tests you need to log the inputs so you can see what the values are. This way if there is some edge case you come across, you can write the test to reproduce it. I've heard the same reasons from people for not using random input, but once you have insight into the actual values used for a particular test run then it isn't as much of an issue.

The notion of "arbitrary" data is also very useful as a way of signifying something that is not important. We have some acceptance tests that come to mind where there is a lot of noise data that is no relevance to the test at hand.

Depending on your object/app, random data would have a place in load testing. I think more important would be to use data that explicitly tests the boundary conditions of the data.

We just ran into this today. I wanted pseudo-random (so it would look like compressed audio data in terms of size). I TODO'd that I also wanted deterministic. rand() was different on OSX than on Linux. And unless I re-seeded, it could change at any time. So we changed it to be deterministic but still psuedo-random: the test is repeatable, as much as using canned data (but more conveniently written).

This was NOT testing by some random brute force through code paths. That's the difference: still deterministic, still repeatable, still using data that looks like real input to run a set of interesting checks on edge cases in complex logic. Still unit tests.

Does that still qualify is random? Let's talk over beer. :-)

I can envisage three solutions to the test data problem:

Test with fixed data
Test with random data
Generate random data once, then use it as your fixed data

I would recommend doing all of the above. That is, write repeatable unit tests with both some edge cases worked out using your brain, and some randomised data which you generate only once. Then write a set of randomised tests that you run as well.

The randomised tests should never be expected to catch something your repeatable tests miss. You should aim to cover everything with repeatable tests, and consider the randomised tests a bonus. If they find something, it should be something that you couldn't have reasonably predicted; a real oddball.

How can your guy run the test again when it has failed to see if he has fixed it? I.e. he loses repeatability of tests.

While I think there is probably some value in flinging a load of random data at tests, as mentioned in other replies it falls more under the heading of load testing than anything else. It is pretty much a "testing-by-hope" practice. I think that, in reality, your guy is simply not thinkng about what he is trying to test, and making up for that lack of thought by hoping randomness will eventually trap some mysterious error.

So the argument I would use with him is that he is being lazy. Or, to put it another way, if he doesn't take the time to understand what he is trying to test it probably shows he doesn't really understand the code he is writing.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow