How much Code Coverage is “enough”?

https://softwareengineering.stackexchange.com/questions/1380

16-10-2019
|

Pergunta

We are starting a push for code coverage here at my work, and it has got me to thinking.... How much code coverage is enough?

When do you get to the point of diminishing returns on code coverage? What is the sweet spot between good coverage and not enough? Does it vary by the type of project your are making (ie WPF, WCF, Mobile, ASP.NET) (These are C# classes we are writing.)

Solução

We aim for at least 70%. On things that are more easily testable (functional data structures, for example), we aim for 90% and most individuals aim for as near to 100% as possible. On WPF-related things and other frameworks that are very difficult to test, we get much lower coverage (barely 70%).

Outras dicas

I'm of the opinion that code coverage alone is a poor metric. It's easy to produce tons of useless tests that cover the code, but don't adequately check the output, or don't test edge cases, for example. Covering code just means it doesn't throw an exception, not that it's right. You need quality tests- the quantity isn't that important.

"Enough" is when you can make changes to your code with confidence that you're not breaking anything. On some projects, that might be 10%, on others, it might be 95%.

It's almost never as high as 100%. However, sometimes trying to get 100% code coverage can be a great way to remove cruft from the code base. Don't forget that there's two ways to increase code coverage - write more tests or take out code. If code isn't covered because it's hard to test, there's a good chance you can simplify or refactor to make it easier to test. If it's too obscure to bother to test, there's usually a good chance that nothing else in the code is using it.

Code coverage approaches 100% asymptotically. Consequently, that last 5% is probably more effort than it is worth, as you begin to achieve vanishingly small returns for the effort expended.

Coverage is a metric to keep an eye on, but it shouldn't be the ultimate goal. I've seen (and admittedly written!) plenty of high coverage code - 100% coverage (TDD, of course), yet:

bugs still come up
design can still be poor
you can really kill yourself shooting for some arbitrary coverage target - pick your battles :p

There's a "The Way of Testivus" entry that I think is appropriate to reference here :)

Only 20% of most code will run 80% of the time. A code coverage analysis is not very useful unless paired with a call graph to determine what needs to be tested the most. That tells you where your edge cases are likely to be. You may come up with 100 tests just for those edge cases, which constitute less than 5% of the actual code.

So, make sure to cover 100% of the 20% that defines critical paths, and at least 50% of the rest (according to the call graph). This should get you (roughly) 70% - 75% total coverage, but it varies.

Don't burn up time trying to get over 70% total coverage while leaving critical edge cases without checks.

Use coverage as a guide to indicate areas not tested. Rather than having a mandate for coverage it is wiser to understand the reason for code not covered. Recording a reason for the shortfall is good discipline that allows the risks to be balanced.

Sometimes the reason is less than desirable 'e.g. ran out of time' but might be OK for an early release. It is better to flag areas to return to for a boost in coverage later.

I work on critical flight software where 100% statement coverage is considered suitable for non-critical systems. For the more critical systems we check branch/decision coverage and use a technique call MC/DC which is sometimes not stringent enough.

We also have to ensure that we have also covered the object code.

It is a balance between risk, in our case very high, against value/cost. An informed choice is needed based upon the risk of missing a bug.

When you start considering changes that would affect run time performance, security, flexibility or maintainability to allow more code coverage it is time to end the quest for more code coverage.

I have projects where that point is 0% because coverage is impossible to calculate without harming the design and other projects where that is as high as 92%.

Code coverage metrics are only useful in pointing out were you might have missed some tests. They tell you nothing about the quality of your tests.

Space critical software requires 100% statement coverage.

At first it makes no sense. Everybody knows that a full test coverage doesn't mean that the code is fully tested and that it is not that difficult to get 100% coverage without actually testing the application.

Nevertheless, 100% coverage is a lower limit: although 100% coverage is not a proof of a bug-free software, it is certain that with a lesser coverage the code is not fully tested and this is simply unacceptable for space critical software.

I really like @RevBingo's answer because he suggests that the struggle toward 100% can cause you to clean up or delete unused code. What I haven't seen in the other answers is a sense of when you need high coverage and when you don't. I took a stab at starting this. I think adding detail to a chart like this would be a more useful pursuit than finding one test coverage number that was right for all code.

100%

For a public API, like the java.util Collections, that's not coupled to a Database and doesn't return HTML, I think 100% coverage is a noble starting goal, even if you settle for 90-95% due to time or other constraints. Increasing test coverage after you are feature complete forces a more detailed level of scrutiny than other kinds of code review. If your API is at all popular, people will use it, subclass it, deserialize it, etc. in ways you can't expect. You don't want their first experience to be finding a bug, or design oversight!

90%

For business infrastructure code, that takes in data structures and returns data structures, 100% is still probably a good starting goal, but if this code isn't public enough to invite a lot of misuse, maybe 85% is still acceptable?

75%

For code that takes in and returns Strings, I think unit testing is much more brittle, but can still be useful in many situations.

50% or less

I hate writing tests for functions that return HTML because it's so brittle. What if someone changes the CSS, the JavaScript, or the whole blob of HTML and English you return makes no sense to human end users? If you can find a function that uses a lot of business logic to produce a little HTML, this may well worth testing. But the reverse situation may not be worth testing at all.

Near 0%

For some code, the definition of "correct" is "makes sense to the end user." There are non-traditional tests you can perform against this code like automated grammar-checking or HTML validating the output. I've even set up grep statements for little inconsistencies we commonly fall prey to at work, like saying "Login" when the rest of the system calls it, "Sign In". This man not strictly be a unit test, but a helpful way to catch issues without expecting specific output.

Ultimately though, only a human can judge what is sensible to humans. Unit testing can't help you there. Sometimes it takes several humans to judge that accurately.

Absolute 0%

This is a sad category and I feel like less of a person for writing it. But in any sufficiently large project there are rabbit holes that can suck person-weeks of time without providing any business benefit.

I bought a book because it claimed to show how to automatically mock data for testing Hibernate. But it only tested Hibernate HQL and SQL queries. If you have to do a lot of HQL and SQL, you really aren't getting the advantage of Hibernate. There's a form of Hibernate in-memory database, but I haven't invested the time to figure out how to use it effectively in tests. If I had that running, I'd want to have high (50%-100%) test coverage for any business logic that calculates stuff by navigating an object graph causing Hibernate to run some queries. My ability to test this code is near 0% right now and that's a problem. So I improve test coverage in other areas of the project and try to prefer pure functions over ones that access the database, largely because it's easier to write tests for those functions. Still, some things cannot, or should not be tested.

I think it depends on the part of the application you are testing. E.g. for business logic or any component involving complex data transformations, I would aim at 90% (as high as possible) coverage. I have often found small but dangerous bugs by just testing as much of the code as possible. I would rather find such bugs during testing than letting them occur at a customer's site one year later. Also, a benefit of a high code coverage is that it prevents people from changing working code too easily, since the tests have to be adapted correspondingly.

On the other hand, I think there are components for which code coverage is less suited. For example, when testing a GUI it is very time consuming to write a test that covers all the code that gets executed when clicking on a button in order to dispatch the event to the right components. I think in this case it is much more effective to use the traditional approach of performing a manual test in which you just click on the button and observe the behaviour of the program (does the right dialog window open up? does the right tool get selected?).

I don't have that high opinion about using code coverage as a measure for knowing when your test suite has enough coverage.

The main reason why is because if you have a process where you first write some code, then some tests, and then look at the code coverage to discover where you have missed a test, then it is you process that needs improving. If you do true TDD, then you have a code coverage 100% out of the box (admittedly, there are some trivialities I don't test for). But if you look at the code coverage to find out what to test, then you will likely write the wrong tests.

So the only thing you can conclude from the code coverage is that if it is too low, you don't have enough tests. But if it is high, there is no guarantee that you have all the right tests.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange