How to effectively test a fixed length flat file parser using MSpec?

https://stackoverflow.com/questions/7078652

23-12-2020
|

题

I have this method signature: List<ITMData> Parse(string[] lines)

ITMData has 35 properties.

How would you effectively test such a parser?

Questions:

Should I load the whole file (May I use System.IO)?
Should I put a line from the file into a string constant?
Should I test one or more lines
Should I test each property of ITMData or should I test the whole object?
What about the naming of my test?

EDIT

I changed the method signature to ITMData Parse(string line).

Test Code:

[Subject(typeof(ITMFileParser))]
public class When_parsing_from_index_59_to_79
{
    private const string Line = ".........";
    private static ITMFileParser _parser;
    private static ITMData _data;

    private Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    private It should_get_fldName = () => _data.FldName.ShouldBeEqualIgnoringCase("HUMMELDUMM");
}

EDIT 2

I am still not sure if I should test only one property per class. In my opinion this allows me to give more information for the specification namely that when I parse a single line from index 59 to index 79 I get fldName. If I test all properties within one class I loss this information. Am I overspecifying my tests?

My Tests now looks like this:

[Subject(typeof(ITMFileParser))]
public class When_parsing_single_line_from_ITM_file
{
    const string Line = ""

    static ITMFileParser _parser;
    static ITMData _data;

    Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    ...

}

解决方案

Here's what I would normally do if I'm facing such a problem:

One short disclaimer in advance: I think I would more go down the "integration testing" or "testing the parser as a whole" route rather than testing individual lines. In the past I've more than once faced the situation where lots of implementation details leaked into my tests and forced me to change the tests often when I changed implementation details. Typical case of overspecification I guess ;-/

I wouldn't include file loading in the parser. As @mquander suggested I would rather go with a TextReader or an IEnumerable as the input parameter instead. This will result in way faster tests since you're able to specify the parser input in-memory and don't have to touch the file system.
I'm not a big fan of hand rolling test data, so in most cases I'm using embedded resources and the ResourceManager to load test data directly from the specification assembly via assembly.GetManifestResource(). I typically have a bunch of extension methods in my solution to streamline the reading of resources (something like TextReader TextResource.Load("NAME_OF_SOME_RESOURCE")).
Regarding MSpec: I'm using one class per file to parse. For each property that is tested in the parsed result I've a separate (It)assertion. These are normally one liners, so the additional amount of coding isn't that big. In terms of documentation and diagnostics imho it's a huge plus since when a property isn't parsed correctly you can see directly which assertion failed without having to look into the source or searching for line numbers. It also appears in your MSpec result file. Besides, you don't hide other failed assertions (the situation where you fix one assertion only to see the spec fail on the next line with the next assertion). This of course forces you to think more about the wording you use in your specifications but for me that's also a huge plus since I'm a proponent of the idea that language forms thinking. In other words, if you've no clue how to frackin name your assertion there's probably something fishy either about your specification or your implementation.
Regarding your method signature for the parser: I wouldn't return a concrete type like List<T> or an array and I would also suggest not to return the mutable List<T> type. What you're basically saying here is: "Hey, you can muck around with the parsing result after I've finished" which in most cases is probably what you don't want. I would suggest to return IEnumerable<T> instead (or ICollection<T> if you REALLY need to modify it afterwards)

其他提示

Should I load the whole file (May I use System.IO)?

If you do this, it's no longer a unit test -- it becomes an integration or regression test. You can do this, if you expect it to show possible bugs that a unit test wouldn't. But that's not too likely.

You're probably better off with unit tests, at least to start.

Should I put a line from the file into a string constant?

If you plan to write more than one test that uses the same input line, then sure. But personally, I would probably tend to write a bunch of different tests, with each one passing a different input string. At that point, there's not much reason to make a constant (unless it's a local constant, declared inside the test method).

Should I test one or more lines?

You didn't specify, but I'm going to assume that your output is one-for-one with your input -- that is, if you pass in three strings, you'll get three ITMDatas returned. In that case, the need for multi-line tests would be limited.

It's almost always worth testing the degenerate case, which in this case would be an empty string array (zero lines). And it's probably worth having at least one test that has more than one line, just so you can make sure there are no silly mistakes in your iteration.

However, if your output is one-for-one with your input, then you really have another method wanting to get out -- you should have a ParseSingleLine method. Then your Parse would be nothing more than iterating lines and calling ParseSingleLine. You would still want a handful of tests for Parse, but most of your testing would focus around ParseSingleLine.

I typically try to consider common success and fail scenarios, along with edge cases. Requirements are also helpful for setting up appropriate use cases. Consider Pex for enumerating various scenarios.

Regarding your newer questions:

Should I test each property of ITMData or should I test the whole object?

If you want to be on the safe side, you should probably have at least one test which checks that each property was matched.

What about the naming of my test?

There are quite a few discussions on this topic, such as this one. The general rule is that you would have multiple methods in your unit test class, each aimed at testing something specific. In your case, it might be things like:

public void Check_All_Properties_Parsed_Correctly(){.....}

public void Exception_Thrown_If_Lines_Is_Null(){.....}

public void Exception_Thrown_If_Lines_Is_Wrong_Length(){.....}

So, in other words, testing for the exact behaviour that you consider "correct" for the parser. Once this is done, you will feel much more at ease when making changes to the parser code, because you will have a comprehensive test suite to check that you didn't break anything. Remember to actually test often, and to keep your tests updated when you make changes! There's a fairly good guide about unit testing and Test Driven Development on MSDN.

In general, I think you can find answers to most of your questions by googling a bit. There are also several excellent books on Test Driven Development, which will drive home not only the how of TDD, but the why. If you are relatively programming language agnostic, I would recommend Kent Beck's Test Driven Development By Example, otherwise something like Test-Driven Development in Microsoft .NET. These should get you on the right track very quickly.

EDIT:

Am I overspecifying my tests?

In my opinion, yes. Specifically, I don't agree with your next line:

If I test all properties within one class I loss this information.

In what way do you lose information exactly? Let's say there are 2 ways to do this test, other than having a new class per test:

Have different methods for each property. Your test methods could be called CheckPropertyX, CheckPropertyY, etc. When you run your tests, you will see exactly which fields passed and which fields failed. This clearly satisfies your requirements, although I would say it's still overkill. I would go with option 2:
Have a few different methods, each testing one specific aspect. This is what I originally recommended, and I think what you are referring to. When one of the tests fails, you will only get information about the first thing that failed, per method, but if you coded your Assert nicely, you will know exactly which property is incorrect. Consider the following code:

Assert.AreEqual("test1", myObject.PropertyX, "Property X was incorrectly parsed"); Assert.AreEqual("test2", myObject.PropertyY, "Property Y was incorrectly parsed");

When one of those fails, you will know which line failed. When you have fixed the relevant error, and re-run your tests, you will see if any other properties have failed. This is generally the approach that most people take, because creating a class or even method per property results in too much code, and too much work to keep up to date.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow