-
-
Notifications
You must be signed in to change notification settings - Fork 181
Developers
- Overview
- Barriers to Cross-Platform Support
- Building other projects
- Source Analysis
- Instrumenting code
- Process Tracing
- Generating Reports
This section aims to lay out some of the structure of Tarpaulin and provides a technical reference for the technologies underpinning it. In each specific section check the end for a list of files the section covers.
When Tarpaulin runs, the basic sequence of events is as follows (assuming the user is collecting coverage):
- Build the tests for the project to cover with any user-supplied options
- Take the list of test binaries and for each one:
- Run source analysis on the project identifying lines that can't be covered or will be absent from debug info but can be covered
- Load the object file and parse to find lines that can be instrumented
- Return a list of these lines and their addresses to populate with coverage stats
- Fork the process, have the child run the test and the parent instrument it
- The parent now steps through the code like a debugger logging coverage stats
- Merge all the coverage statistics to get unified stats for the project
- Generate any reports and save or send them
We can split tarpaulin into a few function areas as a result
- Interactions with the Rust build system (compiler, linker, Cargo)
- Source code analysis: what lines are uncoverable
#[derive(..)]
, what lines will be missed (unused templates) - Finding lines to instrument - understanding what's in the DWARF tables or x86_64 assembly
- Tracing the process - ptrace in Linux, other tools for other operating systems
- Coverage reports - codecov.io, coveralls.io, Cobertura.xml, HTML, other useful formats
- General usability/code quality
As well as these areas involving the implementation of Tarpaulin, work also has to be done in testing and documentation to ensure correctness and make Tarpaulin easy to use.
With only Linux support currently available, there's obviously a demand for support for other operating systems. The easiest to add support for is probably BSDs but thought should be given to Windows. To add support for another OS the following issues would have to be addressed.
- Loading the test binaries and getting the debug information
- Launching and tracing the test binaries (ptrace and the windows equivalent)
- Disabling any OS level security features like ASLR that could prevent tracing
Tarpaulin also currently only supports x64 systems to AMD and Intel 64 bit processors. This is due to needing processor specific opcodes to add breakpoints and processor registers - the program counter. However, this in theory, wouldn't be a large amount of work. But ptrace support can vary wildly between different operating systems and architectures.
Tarpaulin uses Cargo as a library to manage the build systems of other projects and allow users to configure what packages or features are included when they run coverage.
Most coverage tools are designed to work with C which lacks a lot of abstractions that higher level languages have. This can cause language constructs which aren't actually executable code to be mistakenly included as misses and result in other code being omitted from results. Also, multiple addresses may map to the same expression, and with some code where the expression is split over multiple lines, you don't want that expression appearing more than once in the statistics. Below are some examples which Tarpaulin filters out, for more comprehensive examples look to the tests in the source_analysis
module.
Derive macros, the generated code is mapped to the derive statement partially although the executable lines exist outside the project source causing it to be flagged as a missed line.
#[derive(Debug)]
struct SomeStruct;
Any unused meta-programming code won't generate any assembly, therefore isn't included in the debug tables. This means unused traits and templated functions need to be included in the statistics via source analysis. Also, unused inline functions don't generate assembly or debug information.
fn foo<T>(t: T) {
// Some code
}
Relevant files:
- src/source_analysis.rs
Tarpaulin currently only works on Linux where the tests are ELF files and debug information is kept in the DWARF (Debugging With Attributed Record Formats) format. Parsing the DWARF tables is done via Gimli with Object used to load the ELF file.
Object is cross-platform with Linux, Mac and Windows support. Gimli would have to be replaced with an alternative for Windows, however, it should work on Apple operating systems.
Relevant files:
- src/test_loader.rs
Process tracing is done via the Ptrace API on Linux, unfortunately, the API is often not well documented and rather esoteric so can be a constant source of frustration. Because of this more than just the man page is often needed. The ptrace readme for strace (link) is a good starting point as well as anything you can glean from the GDB source.
Ptrace support also differs wildly between Linux and the BSDs with each having their own interpretation and levels of support complicating cross-platform support. An alternative to Ptrace will have to be found for Windows.
Once the test binary has been built and the instrumentation points identified we need to launch and trace the test. Tarpaulin at this point forks and the child sends a ptrace TRACE_ME
request and launches the test with execve
. execve
is used because it means the test keeps the same PID as the child process used to launch it and the child is stopped with a SIGTRAP
after execve
is successful.
At this point, we now initialise the test by placing all the breakpoints. The breakpoint system relies on the INT3
instruction or software interrupt in x64. This instruction is written to each line and the previous instruction byte stored. These writes are aligned so may account for some false negatives in coverage results. When a breakpoint is hit a SIGTRAP
is issued which waitpid
will pick up. We then write the original byte back and send a ptrace step command. This will trigger another SIGTRAP
again (although maybe not straight away as step continues the other threads as well). When this wait comes in we can re-add the breakpoint if that's desired and then continue execution.
With the SIGTRAP
captured and the breakpoints placed the parent is now tracing the program. When the test hits one of these points it issues a SIGTRAP
, Tarpaulin responds and then continues the test running. It is here we reach Tarpaulin's update loop in it's most basic form:
+------------------+
| |
+-----+ Wait for signal <--------+
| | | |
| +------------------+ |
| |
+---------v----------+ +---------+--------+
| | | |
| Signalled by PID | | Continue PID |
| | | |
+---------+----------+ +---------^--------+
| |
| +----------------------+ |
| | | |
+---> Log coverage stats +------+
| |
+----------------------+
Important note: when the parent is signalled by a PID, it can read data from the PID that signalled it as much as it likes and then continue or step the PID. It cannot interact with another PID until the one it got has been continued. Otherwise, a segfault may happen or the child process can crash or behave weirdly. Also, when one of the threads issues a stopped signal all the threads in that process are stopped. Similarly, when the thread is continued with ptrace all the threads are continued.
This complicates multi-threading, as we add and remove instrumentation points and continue and step we modify the code being run. We do this by adding and removing the software interrupts so the original code can run. So we have our parent running and modifying the opcodes and our test binary is running any number of threads which are executing the opcodes. If two threads hit the same instruction at the same time, we'll handle whichever signal is raised first by removing the breakpoint and stepping forward one point in the code. At which point both threads will step.
Also, when you issue a step, the next signal may not be for the PID you stepped it may be another one that was pending when you received the signal you're processing.
- When a PID signals you ptrace must continue that PID before interacting with another
- When one thread stops they all stop
- When a thread is continued they all continue
- The test binary is mutable global data our parent is reading and writing and tests are reading
- Just because you issue a single step doesn't mean that thread will be the next one handled
Struggles with keeping this above view consistent resulted in the --no-count
option being made default and complicate implementing accurate condition coverage.
This section will likely vary for every operating system. However, as currently only Linux is supported it's documented here to help people trying to figure out how Tarpaulin works.
While a test is running the state of the test is represented by a state machine (in src/statemachine.rs
). This has a core state machine which has been designed to be platform agnostic and a handle to OS-specific data and handlers which implements the StateData
trait. As the StateData
trait determines most of the state transitions it doesn't make sense to view the state machine without including the OS-specific actions so the diagram below is what's done for Linux. Some explanation to this as well as parts of the state machine which are platform agnostic will be detailed below. Labels have been left off the transition edges for brevity.
+
| +---------+
| | |
+-----v--v--+ |
+-----------+ START +------+
| +----+------+
| |
| |
| +----v------+
| | INIT +----------------+
| +----+------+ |
| | |
| | +-----------+ |
| | | | |
| +-+---v----+-+ | |
| +------> WAIT <---+ | |
| | +-----+--+---+ | | |
| | | | | | |
| | | +-------+ | |
| | | | |
| | +-----v------+ | |
| +------| STOP + | |
| +-----+------+ | |
| | | |
| | | |
| +------v------+ | |
+---------> END <---------+------+
+-------------+
Initially, while the test is starting up it's in the START
state waiting for the test to become available to initialise the breakpoints in INIT
. For calls to waitpid
the WNOHANG
flag is used to prevent Tarpaulin freezing and having to be manually killed and a time is maintained to check for timeouts. If a timeout occurs the test exits.
Assuming nothing goes wrong in the run, once the test is initialised the basic update loop shown before is executed and is represented by the WAIT
and STOP
states. A timeout can also occur during WAIT
if the test freezes for whatever reason i.e. infinite loops with --no-count
. Once the test has finished executing the END
state is entered, any resources freed and Tarpaulin will go to the next test to run or report the results.
Below is the state machine happy path assuming no errors and also no time waiting for signals to pop up.
+
|
|
+----v------+
| START |
+----+------+
|
|
+----v------+
| INIT |
+----+------+
|
|
+-----v------+
+------> WAIT |
| +-----+------+
| |
| |
| |
| +-----v------+
+------+ STOP |
+-----+------+
|
|
+------v------+
| END |
+-------------+
There are 4 conditions which lead directly to the test being stopped detailed below:
-
END
- always called this is the final cleanup -
TIMEOUT
- a timeout occurred meaning a test ended up hanging or the timeout arg to Tarpaulin is too short -
UNRECOVERABLE
- something went wrong during execution of this test specifically -
ABORT
- something fundamentally wrong occurred which means Tarpaulin won't work for any other tests it has to run.
Currently, the only example of ABORT
being used is if the test binaries aren't Position Independent Executables. This means code addresses are randomised or some other security feature is in place preventing Tarpaulin from placing breakpoints and will affect all the test binaries.
Relevant files:
- src/breakpoint.rs
- src/ptrace_control.rs
- src/statemachine.rs
- src/traces.rs
- src/personality.rs
TODO