Is test coverage an adequate measure of code quality?

https://softwareengineering.stackexchange.com/questions/192

16-10-2019
|

문제

If I have some code that has 80% test coverage (all tests pass), is it fair to say that it's of higher quality than code with no test coverage?

Or is it fair to say it's more maintainable?

해결책

In a strict sense, it is not fair to make any claims until the quality of the test suite is established. Passing 100% of the tests isn't meaningful if most of the tests are trivial or repetitive with each other.

The question is: In the history of the project, did any of those tests uncover bugs? The goal of a test is to find bugs. And if they didn't, they failed as tests. Instead of improving code quality, they might only be giving you a false sense of security.

To improve you test designs, you can use (1) whitebox techniques, (2) blackbox techniques, and (3) mutation testing.

(1) Here are some good whitebox techniques to apply to your test designs. A whitebox test is constructed with specific source code in mind. One important aspect of whitebox testing is code coverage:

Is every function called? [Functional coverage]
Is every statement executed? [Statement coverage-- Both functional coverage and statement coverage are very basic, but better than nothing]
For every decision (like if or while), do you have a test that forces it to be true, and other that forces it to be false? [Decision coverage]
For every condition that is a conjunction (uses &&) or disjunction (uses ||), does each subexpression have a test where it is true/false? [Condition coverage]
Loop coverage: Do you have a test that forces 0 iterations, 1 iteration, 2 iterations?
Is each break from a loop covered?

(2) Blackbox techniques are used when the requirements are available, but the code itself is not. These can lead to high-quality tests:

Do your blackbox tests cover multiple testing goals? You'll want your tests to be "fat": Not only do they test feature X, but they also test Y and Z. The interaction of different features is a great way to find bugs.
The only case you don't want "fat" tests is when you are testing an error condition. For example, testing for invalid user input. If you tried to achieve multiple invalid input testing goals (for example, an invalid zip code and an invalid street address) it's likely that one case is masking the other.
Consider the input types and form an "equivalence class" for the types of inputs. For example, if your code tests to see if a triangle is equilateral, the test that uses a triangle with sides (1, 1, 1) will probably find the same kinds of errors that the test data (2, 2, 2) and (3, 3, 3) will find. It's better to spend your time thinking of other classes of input. For example, if your program handles taxes, you'll want a test for each tax bracket. [This is called equivalence partitioning.]
Special cases are often associated with defects. Your test data should also have boundary values, such as those on, above, or below the edges of an equivalence task. For example, in testing a sorting algorithm, you'll want to test with an empty array, a single element array, an array with two elements, and then a very large array. You should consider boundary cases not just for input, but for output as well. [This is call boundary-value analysis.]
Another technique is "Error guessing." Do you have the feeling if you try some special combination that you can get your program to break? Then just try it! Remember: Your goal is to find bugs, not to confirm that the program is valid. Some people have the knack for error guessing.

(3) Finally, suppose you already have lots of nice tests for whitebox coverage, and applied blackbox techniques. What else can you do? It's time to Test your Tests. One technique you can use is Mutation Testing.

Under mutation testing, you make a modification to (a copy of) your program, in the hopes of creating a bug. A mutation might be:

Change a reference of one variable to another variable; Insert the abs() function; Change less-than to greater-than; Delete a statement; Replace a variable with a constant; Delete an overriding method; Delete a reference to a super method; Change argument order

Create several dozen mutants, in various places in your program [the program will still need to compile in order to test]. If your tests do not find these bugs, then you now need to write a test that can find the bug in the mutated version of your program. Once a test finds the bug, you have killed the mutant and can try another.

Addendum: I forgot to mention this effect: Bugs tend to cluster. What that means is that the more bugs you find in one module, the higher the probability that you'll find more bugs. So, if you have a test that fails (which is to say, the test is successful, since the goal is to find bugs), not only should you fix the bug, but you should also write more tests for the module, using the techniques above.

So long as you are finding bugs at a steady rate, testing efforts must continue. Only when there is a decline in the rate of new bugs found should you have confidence that you've made good testing efforts for that phase of development.

다른 팁

By one definition it's more maintainable, as any breaking change is more likely to be caught by the tests.

However, the fact that code passes the unit tests doesn't mean it's intrinsically of higher quality. The code might still be badly formatted with irrelevant comments and inappropriate data structures, but it can still pass the tests.

I know which code I'd prefer to maintain and extend.

Code with absolutely no tests can be extremely high quality, readable, beautiful and efficient (or total junk), so no, it's not fair to say that code with 80 % test coverage is of higher quality than code with no test coverage.

It could be fair to say that code 80 % covered with good tests is probably of acceptable quality, and probably relatively maintainable. But it guarantees little, really.

I would call it more-refactorable. Refactoring gets extremely easy if code is covered with lot of tests.

It would be fair to call it more maintainable.

I would agree about the maintainable part. Michael Feathers recently posted a video of an excellent talk of his called "The deep synergy between testability and good design" in which he discusses this topic. In the talk he says that the relationship is one way, that is, code that is well designed is testable, but testable code is not necessarily well designed.

It's worth noting that the video streaming is not great in the video, so it might be worth downloading if you want to watch in full.

I have been asking myself this question for some time now in relation to "condition coverage". So how about this page from atollic.com "Why code coverage analysis?"

More technically, code coverage analysis finds areas in your program that is not covered by your test cases, enabling you to create additional tests that cover otherwise untested parts of your program. It is thus important to understand that code coverage helps you understand the quality of your test procedures, not the quality of the code itself.

This seems to be quite relevant here. If you have a test case set that manages to attain a certain level of (code or otherwise) coverage, then you are quite likely invoking the code under test with a rather exhaustive set of input values! This will not tell you much about the code under test (unless the code blows up or generates detectable faults) but gives you confidence in your test case set.

In an interesting Necker Cube change-of-view, the test code is now being tested by the code under test!

There are many ways to guarantee that a program does what you intend, and to ensure that modifications will carry no unintended effects.

Testing is one. Avoiding mutation of data is another one. So is a type system. Or formal verification.

So, while I agree that testing is generally a good thing, a given percent of testing might not mean much. I would rather rely on something written in Haskell with no tests than on a well tested PHP library

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange