Question

Is it a good idea to have individual/unique data sets for each integration test or should all tests reuse the same data? My idea for having individual data sets for each test is to have more control for each test to make it easier to update old tests as well as implementing new data. Instead of making sure the new data for a new test is not interfering with the old data/tests I can just add a new data set that will only be used by my new test. To me it makes sense but when reading up on integration/service testing it seems like most (all?) is using the same data for all tests.

My biggest reason for having unique data sets for each tests is because I want to write tests for a microservice architecture and making sure you get unique ID's that is the same throughout all DB's turned out to be a bit messy. If I would have individual datasets I could follow AAA:

// Assign
Mock up database, could be CSV files loaded into an in memory-db.

// Act
Make a call to the endpoint I want to test, could be done through for example MSTest, Webtest or POSTMAN

// Assert
Make sure that the response contains the data I wanted.

In the case above, the CSV would then be the data sets. So each test would get individual CSV-files that would be used to seed the DB prior to running the test.

Was it helpful?

Solution

It depends completely of the situation:

If you use unique test data sets:

  • you are less subject to be impacted by previous test runs, if the tested system has persistance.
  • you are also less subject to interference if several tests are run in parallel
  • it's easier to trace data when results are not satisfactory
  • you need more effort to generate each test set, because you have to ensure consistency of all the data
  • you test in a bubble, and you might miss stupid mistakes, such as some basic concurrency issues (for example when data remains locked but shouldn't)

If you use same data:

  • the reference data might be altered in an unfavorable manner by a previous test.
  • it's more difficult when a test fails to find out if it's due to the current test run or some previous issues
  • forget about testing concurrency consistency
  • it's easier to prepare test sets.

OTHER TIPS

I would opt for individual data per test.

Technically, it's not important. If you always start from a predefined state on each test, it does not matter to the computer.

However, in the unlikely (cough) event that a test fails, unique data makes it so much easier for the developers to communicate. It's way easier to remember what the test with "Karl & Karen" was about, then the test #54 with Alice. "Karl & Karen" was the nice hypothetical couple with the Golden Retriever where Karl never reads his emails on time and Karen reminds him about it and now both of their accounts have the same email address and it should work anyway... Alice #54? No idea what that was about. It failed? Wow. I need to read the documentation.

Licensed under: CC-BY-SA with attribution
scroll top