How is code coverage measured?

https://softwareengineering.stackexchange.com/questions/366090

29-01-2021
|

Domanda

For many languages there are various tools which measure code coverage. But how exactly does this work?

I have some ideas, how this could work:

Do coverage tools just run the code in the debugger and step through it? Is some kind of static analysis used? Or do some tools just inject some markers into the source code, which are invoked when the code is exercised? I also imagine that in some cases the platform provides tools to measure coverage on a lower level.

Soluzione

Code coverage tools work in two flavours:

either the code is instrumented to record coverage statistics, or
the program is run under a debugger or profiler, or tracing mechanism.

Coverage measurement is a dynamic quality assurance tool, as it measures which code is executed. Static analysis is not sufficient.

If a debugger is available, this makes it easy to interrupt normal execution after each statement and record that statement as covered. However, all these interruptions come at a noticeable runtime cost, which slows down your tests. Debugger-based coverage tools are often limited by the debugging interfaces in what kinds of coverage can be collected. E.g. you may not be able to collect branch coverage within expressions like bar() && baz().

Instrumentation is performed by the compiler or a post-compilation step to inject code into the executable that records coverage. The source code is not modified. This has less runtime overhead than a debugger-based solution, but requires you to compile the program in a special coverage collection mode.

As an example, the Python coverage.py tool uses Python's built-in tracing hooks. In contrast, GCC and Clang support instrumentation-based coverage collection when compiling with the -fprofile-arcs -ftest-coverage flags (you should also disable optimizations and use a debugging build: -g -O0). The advantage when collecting branch coverage: the compiler knows all branches that are present in the machine code, not just the branches easily visible in the source code. When the program is executed, it will record coverage in a file that can be massaged into reports with tools like gcov, lcov, gcovr, and many others. (Disclosure: I maintain gcovr.)

In general, coverage measurement needs the same kind of data as a profiler. Often, these tools use exactly the same infrastructure. However, a profiler can afford to be less exact since the hot spots will be executed often. Unlike coverage tools, they can use sampling to measure how often which code is executed. A sampling profiler regularly interrupts the process and collects a stack trace which points to the current location. This happens less often then at each statement, often only every few milliseconds. So they have less performance impact, but their data is less exact.

Altri suggerimenti

It depends.

For compiled languages, typically you would use compiler support, i.e. the compiler adds additional instructions that increase counters per line. (Or rather, per branchless block.)

For just-in-time-compiled languages, the same thing applies, though it would mostly be the JIT compiler that inserts the extra instructions.

In interpreted environments, the interpreter might provide an interface for profilers to get at the information.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange