Numerical regression testing

https://stackoverflow.com/questions/1055471

20-08-2019
|

Question

I'm working on a scientific computing code (written in C++), and in addition to performing unit tests for the smaller components, I'd like to do regression testing on some of the numerical output by comparing to a "known-good" answer from previous revisions. There are a few features I'd like:

Allow comparing numbers to a specified tolerance (for both roundoff error and looser expectations)
Ability to distinguish between ints, doubles, etc, and to ignore text if necessary
Well-formatted output to tell what went wrong and where: in a multi-column table of data, only show the column entry that differs
Return EXIT_SUCCESS or EXIT_FAILURE depending on whether the files match

Are there any good scripts or applications out there that do this, or will I have to roll my own in Python to read and compare output files? Surely I'm not the first person with these kind of requirements.

[The following is not strictly relevant, but it may factor into the decision of what to do. I use CMake and its embedded CTest functionality to drive unit tests that use the Google Test framework. I imagine that it shouldn't be hard to add a few add_custom_command statements in my CMakeLists.txt to call whatever regression software I need.]

Solution 3

I ended up writing a Python script to do more or less what I wanted.

#!/usr/bin/env python

import sys
import re
from optparse import OptionParser
from math import fabs

splitPattern = re.compile(r',|\s+|;')

class FailObject(object):
    def __init__(self, options):
        self.options = options
        self.failure = False

    def fail(self, brief, full = ""):
        print ">>>> ", brief
        if options.verbose and full != "":
            print "     ", full
        self.failure = True


    def exit(self):
        if (self.failure):
            print "FAILURE"
            sys.exit(1)
        else:
            print "SUCCESS"
            sys.exit(0)

def numSplit(line):
    list = splitPattern.split(line)
    if list[-1] == "":
        del list[-1]

    numList = [float(a) for a in list]
    return numList

def softEquiv(ref, target, tolerance):
    if (fabs(target - ref) <= fabs(ref) * tolerance):
        return True

    #if the reference number is zero, allow tolerance
    if (ref == 0.0):
        return (fabs(target) <= tolerance)

    #if reference is non-zero and it failed the first test
    return False

def compareStrings(f, options, expLine, actLine, lineNum):
    ### check that they're a bunch of numbers
    try:
        exp = numSplit(expLine)
        act = numSplit(actLine)
    except ValueError, e:
#        print "It looks like line %d is made of strings (exp=%s, act=%s)." \
#                % (lineNum, expLine, actLine)
        if (expLine != actLine and options.checkText):
            f.fail( "Text did not match in line %d" % lineNum )
        return

    ### check the ranges
    if len(exp) != len(act):
        f.fail( "Wrong number of columns in line %d" % lineNum )
        return

    ### soft equiv on each value
    for col in range(0, len(exp)):
        expVal = exp[col]
        actVal = act[col]
        if not softEquiv(expVal, actVal, options.tol):
            f.fail( "Non-equivalence in line %d, column %d" 
                    % (lineNum, col) )
    return

def run(expectedFileName, actualFileName, options):
    # message reporter
    f = FailObject(options)

    expected  = open(expectedFileName)
    actual    = open(actualFileName)
    lineNum   = 0

    while True:
        lineNum += 1
        expLine = expected.readline().rstrip()
        actLine = actual.readline().rstrip()

        ## check that the files haven't ended,
        #  or that they ended at the same time
        if expLine == "":
            if actLine != "":
                f.fail("Tested file ended too late.")
            break
        if actLine == "":
            f.fail("Tested file ended too early.")
            break

        compareStrings(f, options, expLine, actLine, lineNum)

        #print "%3d: %s|%s" % (lineNum, expLine[0:10], actLine[0:10])

    f.exit()

################################################################################
if __name__ == '__main__':
    parser = OptionParser(usage = "%prog [options] ExpectedFile NewFile")
    parser.add_option("-q", "--quiet",
                      action="store_false", dest="verbose", default=True,
                      help="Don't print status messages to stdout")

    parser.add_option("--check-text",
                      action="store_true", dest="checkText", default=False,
                      help="Verify that lines of text match exactly")

    parser.add_option("-t", "--tolerance",
                      action="store", type="float", dest="tol", default=1.e-15,
                      help="Relative error when comparing doubles")

    (options, args) = parser.parse_args()

    if len(args) != 2:
        print "Usage: numdiff.py EXPECTED ACTUAL"
        sys.exit(1)

    run(args[0], args[1], options)

OTHER TIPS

You should go for PyUnit, which is now part of the standard lib under the name unittest. It supports everything you asked for. The tolerance check, e.g., is done with assertAlmostEqual().

The ndiff utility may be close to what you're looking for: it's like diff, but it will compare text files of numbers to a desired tolerance.

I know I'm quite late to the party, but a few months ago I wrote the nrtest utility in an attempt to make this workflow easier. It sounds like it might help you too.

Here's a quick overview. Each test is defined by its input files and its expected output files. Following execution, output files are stored in a portable benchmark directory. A second step then compares this benchmark to a reference benchmark. A recent update has enabled user extensions, so you can define comparison functions for your custom data.

I hope it helps.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow