How do you check that code has been covered automatically?

https://softwareengineering.stackexchange.com/questions/272794

07-10-2020
|

Domanda

I am in the process of setting up a Bamboo server for some new projects for a push to TDD in a CI/CD workflow. Sure, unit testing is great, but only as log as it is there.

Now this might be better off in a GIT pre-recieve hook on certain branches (i.e: develop and main release branches) but how should code coverage be enforced, if at all. I am happy to trust committers to ensure code is covered, but how are these things upheld without any slippage apart from diligence and consistency?

In short, I would like to see how others enforce test coverage as an automatic process during either commit or build stages.

Soluzione

You should't enforce code coverage automatically.

This is like enforcing the maximum lines of code per method: agreed, most methods should be less than, say, 20 LOC, but there are valid cases where methods would be longer than that.

In the same way, targeting a given percentage of code coverage per class may lead to unwanted consequences. For instance:

Boilerplate code classes or classes created by code generators may not be tested at all. Forcing developers to test them won't have any benefit and will have a substantial cost in terms of time spent doing it.
Simple code handling unimportant parts of the application doesn't necessarily have to be tested.
In some languages, some code cannot be tested. I had this case in C# with anonymous methods on a library where I really wanted to have 100% code coverage. Those cases may be demoralizing for developers.

More importantly, code coverage should be proportional to two aspects of the code: how critical and how complicated it is:

A piece of code with a complicated logic which is a part of the major feature of an application would better be tested thoughtfully, because failures or regressions can have important consequences.
A simple piece of code which handles a feature nobody uses may have basic tests which cover only basic cases.

Of course, you can still use code coverage as a measurement, especially to compare how different teams achieve code coverage: there may be teams which are less disciplined and more reluctant when it comes to testing. In those cases, you may want to combine this metric with others, such as the number of bugs, the time spent resolving bugs or the number of remarks during code reviews.

You may also want to enforce at least some code coverage, say 60%¹, on individual projects where it makes sense (be careful to exclude prototypes, generated code, CRUD, etc.) Making it possible for developers to mark specific classes as excluded from code coverage is also nice². In this case, this may be done under a form of a check which fails a build if code coverage is below the required minimum. I would do it at build stage, not commit stage, since you are not expected to run unit tests during commit.

^{¹ I would consider 60% as a reasonable minimum based on my code base: nearly every project or class which has less than 60% code coverage is really untested. This may vary a lot from a language to another and from a company to another (in some companies, 0% is a standard). Make sure to discuss with your team what is normal and what is too high for them. Maybe they are constantly hitting 95% and can easily target 99%, or maybe they struggle to increase their code coverage from 70% to 75%.}

^{² Given that eventual abuse will be detected during code reviews, you shouldn't be afraid to give this possibility to developers. This is similar to the possibility to exclude some parts of the code from the checks by the linters or style checkers. JSLint, StyleCop and Code Analysis are three examples where exclusion is supported and is actually useful without encouraging abuse.}

Altri suggerimenti

Consider the following code:

rsq = a*a + b*b;
if (rsq >= 0.0) {
    r = sqrt(rsq);
}
else {
    handle_this_impossible_event();
}

There is no way to create a test case that will reach that else branch. Yet if this was safety critical flight software, people would be all over the author's case if that protection against sending a negative value to sqrt was not present. Typically the computation of rsq = a*a + b*b and the extraction of the square root are separated by multiple lines of code. In the interim, a cosmic ray can flip the sign bit on rsq.

In fact, flight software has invoked the equivalent of handle_this_impossible_event(), multiple times. Usually this involves switching control over to a redundant computer, gracefully shutting down the suspect computer, restarting the suspect computer, and finally the suspect computer taking the role of the backup. That's much better than the primary flight computer going whacko.

Even in flight software, it's impossible to achieve 100% code coverage. The people who claim they have achieved this either have trivial code or they don't have enough tests against these impossible events.

Test coverage is a useful metric for the overall health of your project. A high test coverage lets you do an informed decision on whether the software will work as expected when deployed; with a low test coverage implying you're merely guessing. Tools exist to measure coverage automatically, these usually work by running the program in a debugging context or by injecting bookkeeping operations into the executed code.

There are different kinds of tests, and different kinds of coverage metrics. Common coverage metrics include function coverage, statement coverage, branch coverage, and condition coverage, although there are more.

Unit tests check whether the implementation of one conceptual unit (module, class, method, …) conforms to its specification (in TDD, the test is the specification). Units without their own unit tests are a red flag, though they might be covered by integration-style tests.

Unit tests should imply near-total function coverage – since the unit test exercises the whole public interface of that unit, there should be no functionality that isn't touched by these tests. If you're introducing unit testing to an existing code base, function coverage is a rough progress indicator.

A unit test should strive for good (75%–100%) statement coverage. Statement coverage is a quality metric for an unit test. Total coverage is not always possible, and you probably have better uses of your time than to improve coverage beyond 95%.

Branch and condition coverage are more tricky. The more complicated or important one piece of code is, the higher should these metrics be. But for unspectacular code, high statement coverage tends to be enough (and already implies a branch coverage of at least 50%). Looking at the condition coverage report of one unit can help to construct better test cases.
Integration tests check whether multiple units can correctly work with each other. Integration tests can be very useful without scoring high in any coverage metric. While integration tests would usually exercise a large part of their unit's interfaces (i.e. have high function coverage), the internals of these units have already been covered by the unit tests.

Running tests before code gets merged into a main branch is a good idea. However, calculating test coverage metrics for the whole program tends to take a lot of time – this is a nice job for a nightly build. If you can figure out how to do this, a good compromise would be to only run changed tests or unit tests on changed units in a Git hook. Test failures are not acceptable for anything other than “work in progress”-commits. If selected coverage metrics drop below some threshold (e.g. statement coverage below 80%, or the introduction of new methods without any corresponding tests), then these problems should be treated as a warning, with an opportunity for the developer to fix these potential issues. However, sometimes there are good reasons to ignore these warnings, and developers should be able to do so.

Testing is good, but too much of it can get annoying. Quick, relevant feedback can help to foster attention to quality, but you don't want it to get in the way of producing value. I personally prefer running tests manually, since it allows me to get faster feedback on the part I'm working on. Before release, I'll do a quality focus where I use static analysis, profilers, and code coverage tools to find problem zones (with some of these steps being part of a pre-release test suite).

Noone has mentioned mutation tests. The idea behind them is quite practial and intuitive.

They work by modifying source code randomly (eg. switching ">" into "<") - hence mutation - and checking whether these haphazard changes break any test.

If they don't, then either a) the code in question might be unnecessary, or b) (more likely) this piece of code isn't covered by a test, since breaking it goes undetected.

Code coverage data can of course be obtained automatically, but no automatic decisions should be made based on them, for reasons that others have already explained. (Too fuzzy, too much room for error.)

However, the next best thing is to have an established process whereby the current state of the project in terms of code coverage gets regularly checked by human beings, possibly with daily reports arriving in the project manager's inbox.

In enterprise environments this is achieved with Continuous Integration tools such as Hudson, Jenkins, etc. These tools are configured to regularly check out the entire project from the source code repository, build it, run the tests, and generate reports. Of course, they can be configured to run the tests in code coverage mode and include the results in these reports.

Jetbrains also makes TeamCity, which seems to me a bit more lightweight and suitable for smaller software shops.

So, the project manager receives regular code coverage reports, uses his own good judgement, and acts as the enforcer if needed.

Code coverage can be checked automatically, despite popular opinion, Rational's Purify suite of tools included a code coverage feature. It relied on instrumenting all functions (it worked on the binaries, updating each function or call with a bit of extra code) so it could write out data that was then displayed to the user. Pretty cool technology, especially at the time.

However, even when we tried really, really hard to get 100% coverage we only managed 70% or so! So its a bit of a pointless exercise.

In the situation of writing unit tests however, I think 100% unit test coverage is even more pointless. Unit test those methods that require unit testing, not every getter or setter! Unit testing should be about verifying the tricky functions (or classes TBH) and not trying to tick boxes in some process or tool that shows nice green ticks.

I have build a tool for this

https://github.com/exussum12/coverageChecker

Using

bin/diffFilter --phpunit diff.txt clover.xml 70

Will fail if less than 70% of the diff is covered by unit tests.

Get the diff by.

git diff origin/master... > diff.txt

Assuming you branched from master and will merge back into master

Ignore the phpunit flag in the code it really is just a clover check so anything which can output clover can use it.

As other answers suggested putting this to 100 % is not a good idea

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange