Question

I am curious whether there are metrics on whether code coverage actually improves code quality? Any research studies?

If so, at what percent does it become a case of diminishing returns?
If not, why do so many people treat it as a religious doctrine?

My skepticism is anecdotal and is brought on by 2 projects I was involved with - both implemented the same reasonably complex product. First one just used targeted unit tests here and there. Second one has a mandated 70% code coverage. If I compare the amount of defects, the 2nd one has almost an order of magnitude more of them. Both products used different technologies and had a different set of developers, but still I am surprised.

Was it helpful?

Solution

I'm assuming you are referring to a Code Coverage metric in the context of unit testing. If so, I think you indirectly have already answered your question here:

First project just used targeted unit tests here and there. Second one has a mandated 70% code coverage. If I compare the amount of defects, the 2nd one has almost an order of magnitude more of them.

In short no, a Code Coverage metric does not improve the quality of a project at all.

There's also a common belief that Code Coverage reflects the quality of the unit tests but it doesn't. It doesn't give you an information what parts of your system are properly tested either. It only says what code has been executed by your test suite. What you know for sure is that code coverage gives you only an information what parts of your system are not tested.

However, the Code Coverage metric may relate to overall code quality if you are sure of the quality of your unit tests. The quality of a unit test can be defined as the ability of being able to detect a change in your code base that breaks some business requirement. In other words, every change that breaks particular a requirement (acceptance criterion) should be detected by good quality tests (such tests should simply fail). One of the simplest and automated approaches to measure the quality of your test suite which does not involve too much additional effort from you side is mutation testing.

UPDATE:

http://martinfowler.com/bliki/TestCoverage.html

OTHER TIPS

Code coverage tells you how much of your code is covered by tests. It does not tell you much about the quality of the tests. For example, a code coverage of, say, 70% might be obtained by automated tests exercising trivial functionality like getters and setters and leaving out more important things like verifying that some complex computation delivers correct results, corner cases, and so on. Even if you have 100% code coverage, your tests might not consider special inputs to your code that cause it to fail. So, a relatively high code coverage does not necessarily imply that the code is well tested and therefore important defects may still not be detected by the tests.

On the other hand, a low code coverage means that a lot of the code is not tested at all, so it can be that some important modules are not properly verified. Sometimes it makes sense to have a relatively low code coverage for automated tests, e.g. it can be more effective to click on a GUI button and verify that the appropriate dialog opens (manual test) than to write a corresponding automated tests. Nevertheless, even in this scenario the combined coverage for automated and manual tests would be high.

So, IMO code coverage alone is not a good indicator of the quality of your tests because it only works in one direction:

  1. a low code-coverage score can correctly point out code that is not tested, and may be buggy or even dead code;
  2. a high code-coverage score can hide poor testing and can give you too much confidence in the quality of your code.

NOTE

Thanks to gnat for pointing me at code coverage for manual tests.

As a reductio ad absurdum: the following test covers 60% of the lines of the function:

def abs(x):
    if x < 0:
        return -x
    else:
        return x

assertEquals(abs(-10), 10)

whereas in this example, we have 100% coverage:

def abs(x):
    if x < 0:
        return -x

assertEquals(abs(-10), 10)

Of course, only the latter has a bug.

Code coverage can help, but on its own is not a good indicator.
Where it can help is that it forces people to consciously work with the code in order to write the tests that provide that coverage, and that's likely to cause them to see potential problems and fix them.

But if the people doing this aren't actually interested in the code they can mechanically just build test code that covers everything without bothering to think about what the code actually does and if that's correct.

As a result it can lead to a false sense of security. But if the team is properly motivated and interested in delivering quality it's a good way to help a team find areas of the code that are suspicious and need to be looked at for potential problems.

And just counting covered lines isn't enough for that, you also need branch coverage for example, testing the different paths through conditional statements or all possible outcomes.

Was the 70% code coverage requirement put in before programmers where not writing unit tests? If so, I expect the result is more to do with the attitude of the programmers on the project rather than the 70% code coverage rule?

Code coverage is a good tool to help with targeting unit tests, but only when it is used by people who believe in the benefit of the unit tests and are skillful in writing unit tests.

No. Code Coverage doesn't improve code quality.

Simple, code coverage tells you how much your line of code was executed in test methods.
It is not telling you does result of your production code was asserted or not.

I think this cannot give you information about quality of production code.

If you write code in TDD style, then you don't need code coverage at all. You already write only code which covered by tests written before.

Licensed under: CC-BY-SA with attribution
scroll top