Where to load stub data from when unit testing

https://stackoverflow.com//questions/23001887

20-12-2019
|

Question

For purposes of unit testing, I need to mock up a network response. The response is normally a byte stream, stored as a const vector<uint8_t>. For the unit test however, I would like to produce the vector with data that is either hardcoded in the CPP file or read from a file in the same solution. My example data is about 6 kb. What is the general guidance on where to place data when using googletest?

Solution

Perhaps (a) you require a large sequence of data for some role in which test cases will just read it. This may as well be (class) global data, with const access.

Perhaps (b) you require a large sequence of data for some role in which test cases will read and modify or destroy it. This needs to be reninitialized per test case and have non-const access.

Perhaps both. In either case, a conventional googletest implementation would use a test fixture to encapsulate acquisition of the data, would acquire it in the implementation of the fixture's virtual Setup() member function, and access it through a getter method of the fixture.

The following program illustrates a fixture that provides both per-case mutable data and global constant data acquired from files.

#include <vector>
#include <fstream>
#include <stdexcept>
#include "gtest/gtest.h"

class foo_test : public ::testing::Test
{
protected:
    virtual void SetUp() {
        std::ifstream in("path/to/case_data");
        if (!in) {
            throw std::runtime_error("Could not open \"path/to/case_data\" for input");
        }
        _case_data.assign(
            std::istream_iterator<char>(in),std::istream_iterator<char>());
        if (_global_data.empty()) {
            std::ifstream in("path/to/global_data");
            if (!in) {
                throw std::runtime_error(
                    "Could not open \"path/to/global_data\" for input");
            }
            _global_data.assign(
                std::istream_iterator<char>(in),std::istream_iterator<char>());
        }
    }
    // virtual void TearDown() {}   
    std::vector<char> & case_data() {
        return _case_data;
    }
    static std::vector<char> const & global_data() {
        return _global_data;
    }

private:
    std::vector<char> _case_data;
    static std::vector<char> _global_data;

};

std::vector<char> foo_test::_global_data;

TEST_F(foo_test, CaseDataWipe) {
  EXPECT_GT(case_data().size(),0);
  case_data().resize(0);
  EXPECT_EQ(case_data().size(),0);
}

TEST_F(foo_test, CaseDataTrunc) {
  EXPECT_GT(case_data().size(),0);
  case_data().resize(1);
  EXPECT_EQ(case_data().size(),1);
}

TEST_F(foo_test, HaveGlobalData) {
  EXPECT_GT(global_data().size(),0);
}


int main(int argc, char **argv) {
  ::testing::InitGoogleTest(&argc, argv);
  return RUN_ALL_TESTS();
}

For case (a), you might also consider acquiring the data in a global SetUp member function by subclassing ::testing::Environment, but I see no general reason for preferring to do it that way.

...Or hard-code it?

Then to the issue of whether to keep the test data in a file at all, or hard code it in the test source. Readers who are happy at this point will only be bored from now on.

As a general issue this a matter of judgement-in-the-circumstances and I don't think that the use of googletest tips the scales materially. I think the chief consideration is: Is it desirable to be able to vary an item of test-data without rebuilding the test-suite?

Say rebuilding the test suite to vary this item is a non-negligible cost and you anticipate that the content of the item will vary in the future independently of associated test code. Or it can vary, independently of associated test code, for different configurations of the system under test. In that case, best get the item from a file or other source that can be selected by runtime parameters of the test suite. In googletest, subclassing class ::testing::Environment is a designed facility for parameterized acquisition of test suite resources.

If in reality the content of a test data item is loosely coupled with the associated test code, then hard-coding it into test-cases is most unlikely to be a prudent choice. (And test files, as opposed to other kinds of runtime configurators, have the valuable property that they can be version-controlled in the same system as source-code.)

If the content of a test data item is firmly coupled with the associated test code then I am biased to hard-code it rather than extract it from a data file. Just biased, not dogmatically committed. Maybe your test suite employs robust library facilities for initializating public API test data from, say, XML files that are also hooked into test-management and defect-management, systems. Fine!

I take it as plainly desirable that if a file of test data is a primary test resource - one that the test suite cannot generate - then its content had best be textual data that a competent maintainer can readily understand and manipulate. In this setting I would of course consider that a list of C/C++ hex constants, for example, is textual data - it is source code. If a test file contains binary or dauntingly machine-oriented data then the test suite had best contain the means of its production from legible primary resources. It is sometimes unavoidable for a test suite to depend upon externally sourced "archetypal" binaries but they almost inevitably entail the grim spectacle of test engineers and bug-fixers turning grey in front of hex-editors.

Given the principle that primary test data should be legible to maintainers, we can take it as a norm that primary test data will be "some sort of code": it will be logic-free, but it will be the sort of textual stuff that programmers are accustomed to surveying and editing.

Imagine that a particular sequence of 4096 64-bit unsigned integers (the Big Magic Table) is required for testing your software and is tightly wedded to the associated test code. It could be hard-coded as a huge vector or array initializer list in some source file of the test suite. It could be extracted by the test suite from a data file maintained in CSV format or in CSV-punctuated lines.

For extraction from a data file and against hard-coding, it can be urged (as per Andrew McDonell's answer) that this valuably achieves a disentangling of revisions to the BMT from revisions of other code in the same source file. Likewise it might be urged that any source code that frames enormous literal initializations tends to be unsurveyable and hence a maintenance liability.

But both of these points may be countered with the observation that the defining declaration of the BMT might be coded in a source file all of its own. It could be a code-review policy for the test suite that test-data initializations must be so coded - and perhaps in files that adhere to a distinctive naming convention. A fanatical policy, to be sure, but not more fanatical than one that would insist that all test data initializers must be extracted from files. If a maintainer is obliged to survey the BMT in whatever file contains it, it will make no difference whether the file extension is .cpp, .dat or whatever: all the matters is the comprehensibility of "the code".

For hard-coding and against extraction from a data file, it can be urged that extraction from a data file must introduce a source of irrelevant potential failures into test cases - all the Should Not Happen errors that might defeat reading the right data from a file. This imposes an overhead on test development to effect a correct and clear distinction between genuine test failures and failures to acquire the test data from file, and to clearly diagnose all the possible causes of the latter.

In the case of googletest and comparably functional frameworks this point can be countered, to a degree, by adverting to polymorphic fixture base classes like ::testing::Test and ::testing::Environment. These facilitate the test developer in encapsulating the acquisition of test resources in test-case or test-suite initialization so that it is all over, either successfully or with a diagnosed failure, before the constituent tests of any test case are run. RAII can maintain an unproblematic divide between setup failures and real failures.

Nevertheless, there is an irreducible file-handling overhead for the data-file route and there is an operational overhead that the RAII features of the framework do nothing to reduce. In my dealings with hefty test systems that trade on data files, the data files just are more prone to operational mishaps than source files that only have to be present and correct at build-time. Data files are more likely to turn up missing or misplaced during runtime, or containing malformed stuff, or somehow to have become permission-denied, or somehow to appear at the wrong revision. Their uses in a test system are not as simple or as rigidly controlled as those of source files. Things That Should Not Happen happening to test data files is part of the operational friction of test systems that rely on them and is proportional to their number.

Since source files can encapsulate test data initializations hygienically for revision tracking, hard-coding them can be equated to extraction from a file, with the preprocessor doing the extracting as a by-product of compilation. In that light, why employ other machinery, with additional liabilities, to extract it? There may be good answers, like the suggested XML interface with test-management, defect-management systems, but "It's test data, so don't hard-code it" is not a good one.

Even if a test suite must support various configurations of the system under test that call for various instantiations of a test data item, if the data item is convariant with build configurations of the test suite, you may as well still (hygienically) hard-code it and let conditional compilation select the right hard-coding.

So far I haven't challenged the revision-tracking-hygeine argument for file-based segregation of test data initializers. I've just made the point that regular source files in which the initializers are hard-coded can accomplish this segregation. And I don't want to trounce that argument, but I want to stop it short of the fanatical conclusion that test data initializers should on principle always be extracted from dedicated files - whether source files or data files.

There's no need to belabour the reasons for resisting this conclusion. That way lies test code that is locally less intelligible than the average pizza-eating programmer would write and organisations of test suite files that grow mind-boggling far more quickly than is necessary or healthy. Normatively, all the primary resources of the test suite are "some sort of code". A programmer's skillset includes the skill of partitioning code into files with appropriate granularity to secure the appropriate revision-tracking hygeine. It's not a mechanical procedure, it's an expertise, for code review to cover. Code review can and should ensure that test data initializations, however they are accomplished, are well designed and crafted vis-a-vis revision-tracking just as in all the other routine respects.

Bottom line: If you want to be able run the same build of your test suite for a variety of these mock network responses, read it from a file. If on the other hand it is invariant or covariant with build configurations of the test suite, why not hard code it?

OTHER TIPS

(Caveat - this answer is generic to any unit test framework)

I prefer to keep test data files as separate objects in a revision control system. This provides the following benefits:

you could code the unit test to accept any or multiple data files to test a variety of situations
you can track changes in the data as needed

If you don't want the unit test execution reading a data file, which can be a necessary condition in some situations, you might choose to write a program or script that generates C++ code that initialises the vector at fixture setup.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow