-
-
Notifications
You must be signed in to change notification settings - Fork 181
Developers
- Overview
- Barriers to Cross-Platform Support
- Building other projects
- Source Analysis
- Instrumenting code
- Process Tracing
- Generating Reports
- Debugging
This section aims to lay out some of the structure of Tarpaulin and provides a technical reference for the technologies underpinning it. In each specific section check the end for a list of files the section covers. Note it is out of date and largely just describes the state of ptrace coverage. Llvm coverage instrumentation is now supported as a coverage backend!
When Tarpaulin runs, the basic sequence of events is as follows (assuming the user is collecting coverage):
- Parse options and look for any project specific Tarpaulin configuartion
- Build the tests for the project to cover with any user-supplied options
- Take the list of test binaries and for each one:
- Run source analysis on the project identifying lines that can't be covered or will be absent from debug info but can be covered
- Load the object file and parse to find lines that can be instrumented
- Return a list of these lines and their addresses to populate with coverage stats
- Fork the process, have the child run the test and the parent instrument it
- The parent now steps through the code like a debugger logging coverage stats
- Merge all the coverage statistics to get unified stats for the project
- Generate any reports and save or send them
We can split Tarpaulin into a few function areas as a result
- Handling configuration
- Interactions with the Rust build system (compiler, linker, Cargo)
- Source code analysis: what lines are uncoverable
#[derive(..)]
, what lines will be missed (unused templates) - Finding lines to instrument - understanding what's in the DWARF tables or x86_64 assembly
- Tracing the process - ptrace in Linux, other tools for other operating systems
- Coverage reports - codecov.io, coveralls.io, Cobertura.xml, HTML, other useful formats
- General usability/code quality
As well as these areas involving the implementation of Tarpaulin, work also has to be done in testing and documentation to ensure correctness and make Tarpaulin easy to use.
With only Linux support currently available, there's obviously a demand for support for other operating systems. Using ptrace the lowest effort OS to add support for is probably the numerous BSDs (excluding Apple).
For support via another OS processing tracing API the following challenges would need to be tackled:
- Loading the test binaries and getting the debug information
- Launching and tracing the test binaries (ptrace and the windows equivalent)
- Disabling any OS level security features like ASLR that could prevent tracing
Tarpaulin also currently only supports x64 systems to AMD and Intel 64 bit processors. This is due to needing processor specific opcodes to add breakpoints and processor registers - the program counter. However, this in theory, wouldn't be a large amount of work. But ptrace support can vary wildly between different operating systems and architectures. Given 64 bit support seems ubiquitous in the world of Intel/AMD there's been no demand for 32 bit support.
Apple note: ptrace has been gutted of it's most useful parts in Apple to restrict reverse engineering efforts. Instead Apple offers a debug port API which works with signed binaries. I don't own any Apple devices and didn't make much headway even signing my binary when I borrowed one for a month to try and progress Mac support.
Additionally, there's an opportunity to use probe-rs to collect coverage for tests deployed to embedded devices. Both of these are long term goals and have tracking issues #549 and #675 respectively.
Tarpaulin calls to Cargo as an external process to build the users tests and as such a lot of the arguments available are exposing cargo functionality. But in addition we also add some extra linker flags to ensure more accurate results and use the json output of cargo to find things like doctests.
Rust allows for examples in doc comments to be ran as part of doctests to ensure your documentation works. Currently, there is an unstable feature to persist these generated doc test binaries and this is used for doctest coverage.
The doctest binary name is a combination of the file name with path separators
and dots replaced with _
. Then at the end of the name there are two fields,
the line number and an index both delimited by underscores. This is probably
because by replacing the path separators and dots with _
two different files
with doctests on the same line can otherwise end up with identical names.
However, this poses a mild problem. If a doctest is marked as should_panic
then the binary should panic and return a non-zero exit code. So tarpaulin
works out if a binary should panic and marks it as passing if the return code
is non-zero and failing if it's zero. Otherwise it maintains the same behaviour
as other tests and propagates up the return code. No users (yet) have reported
incorrect failures/passes with this but it's something to be aware of.
Most coverage tools are designed to work with C which lacks a lot of
abstractions that higher level languages have. This can cause language
constructs which aren't actually executable code to be mistakenly included as
misses and result in other code being omitted from results. Also, multiple
addresses may map to the same expression, and with some code where the
expression is split over multiple lines, you don't want that expression
appearing more than once in the statistics. Below are some examples which
Tarpaulin filters out, for more comprehensive examples look to the tests in the
source_analysis
module.
Derive macros, the generated code is mapped to the derive statement partially although the executable lines exist outside the project source causing it to be flagged as a missed line.
#[derive(Debug)]
struct SomeStruct;
Any unused meta-programming code won't generate any assembly, therefore isn't included in the debug tables. This means unused traits and templated functions need to be included in the statistics via source analysis. Also, unused inline functions don't generate assembly or debug information.
fn foo<T>(t: T) {
// Some code
}
Relevant files:
- src/source_analysis.rs
Tarpaulin currently only works on Linux where the tests are ELF files and debug information is kept in the DWARF (Debugging With Attributed Record Formats) format. Parsing the DWARF tables is done via Gimli with Object used to load the ELF file.
Object is cross-platform with Linux, Mac and Windows support. Gimli would have to be replaced with an alternative for Windows, however, it should work on Apple operating systems.
Relevant files:
- src/test_loader.rs
Process tracing is done via the Ptrace API on Linux, unfortunately, the API is often not well documented and rather esoteric so can be a constant source of frustration. Because of this more than just the man page is often needed. The ptrace readme for strace (link) is a good starting point as well as anything you can glean from the GDB source.
Ptrace support also differs wildly between Linux and the BSDs with each having their own interpretation and levels of support complicating cross-platform support. An alternative to Ptrace will have to be found for Windows.
Once the test binary has been built and the instrumentation points identified we
need to launch and trace the test. Tarpaulin at this point forks and the child
sends a ptrace TRACE_ME
request and launches the test with execve
. execve
is used because it means the test keeps the same PID as the child process used
to launch it and the child is stopped with a SIGTRAP
after execve
is
successful.
At this point, we now initialise the test by placing all the breakpoints. The
breakpoint system relies on the INT3
instruction or software interrupt in x64.
This instruction is written to each line and the previous instruction byte
stored. These writes are aligned so may account for some false negatives in
coverage results. When a breakpoint is hit a SIGTRAP
is issued which waitpid
will pick up. We then write the original byte back and send a ptrace step
command. This will trigger another SIGTRAP
again (although maybe not straight
away as step continues the other threads as well). When this wait comes in we
can re-add the breakpoint if that's desired and then continue execution.
With the SIGTRAP
captured and the breakpoints placed the parent is now tracing
the program. When the test hits one of these points it issues a SIGTRAP
,
Tarpaulin responds and then continues the test running. It is here we reach
Tarpaulin's update loop in it's most basic form:
+------------------+
| |
+-----+ Wait for signal <--------+
| | | |
| +------------------+ |
| |
+---------v----------+ +---------+--------+
| | | |
| Signalled by PID | | Continue PID |
| | | |
+---------+----------+ +---------^--------+
| |
| +----------------------+ |
| | | |
+---> Log coverage stats +------+
| |
+----------------------+
Important note: when the parent is signalled by a PID, it can read data from the PID that signalled it as much as it likes and then continue or step the PID. It cannot interact with another PID until the one it got has been continued. Otherwise, a segfault may happen or the child process can crash or behave weirdly. Also, when one of the threads issues a stopped signal all the threads in that process are stopped. Similarly, when the thread is continued with ptrace all the threads are continued.
This complicates multi-threading (more on that later), as we add and remove instrumentation points then continue/step we modify the code being run. We do this by adding and removing the software interrupts so the original code can run. So we have our parent running and modifying the opcodes and our test binary is running any number of threads which are executing the opcodes. If two threads hit the same instruction at the same time, we'll handle whichever signal is raised first by removing the breakpoint and stepping forward one point in the code. At which point both threads will step.
Also, when you issue a step, the next signal may not be for the PID you stepped it may be another one that was pending when you received the signal you're processing.
- When a PID signals you ptrace must continue that PID before interacting with another
- When one thread stops they all stop
- When a thread is continued they all continue
- The test binary is mutable global data our parent is reading and writing and tests are reading
- Just because you issue a single step doesn't mean that thread will be the next one handled
Struggles with keeping this above view consistent resulted in the --no-count
option being made default and complicate implementing accurate condition
coverage.
So as multiple threads can hit a breakpoint at the same time the point at which
waitpid
returns an event, there may be more events in the queue. Any that
have hit a breakpoint need it disabled and to be stepped back to the start of
the instruction otherwise you'll get a SIGILL as the ptrace continue/step
commands will continue that thread with the program counter in an invalid
position.
To handle this Tarpaulin will call waitpid
until there are no more wait
events and then going over this list of pending stops picking an action for
each process/thread id. This design change fixed the majority of bugs in
code with a lot of threading and removed the need to set --test-threads=1
when running test executables.
This section will likely vary for every operating system. However, as currently only Linux is supported it's documented here to help people trying to figure out how Tarpaulin works.
While a test is running the state of the test is represented by a state machine
(in src/statemachine.rs
). This has a core state machine which has been
designed to be platform agnostic and a handle to OS-specific data and handlers
which implements the StateData
trait. As the StateData
trait determines most
of the state transitions it doesn't make sense to view the state machine without
including the OS-specific actions so the diagram below is what's done for Linux.
Some explanation to this as well as parts of the state machine which are
platform agnostic will be detailed below. Labels have been left off the
transition edges for brevity.
+
| +---------+
| | |
+-----v--v--+ |
+-----------+ START +------+
| +----+------+
| |
| |
| +----v------+
| | INIT +----------------+
| +----+------+ |
| | |
| | +-----------+ |
| | | | |
| +-+---v----+-+ | |
| +------> WAIT <---+ | |
| | +-----+--+---+ | | |
| | | | | | |
| | | +-------+ | |
| | | | |
| | +-----v------+ | |
| +------| STOP + | |
| +-----+------+ | |
| | | |
| | | |
| +------v------+ | |
+---------> END <---------+------+
+-------------+
Initially, while the test is starting up it's in the START
state waiting for
the test to become available to initialise the breakpoints in INIT
. For calls
to waitpid
the WNOHANG
flag is used to prevent Tarpaulin freezing and having
to be manually killed and a time is maintained to check for timeouts. If a
timeout occurs the test exits.
Assuming nothing goes wrong in the run, once the test is initialised the basic
update loop shown before is executed and is represented by the WAIT
and STOP
states. A timeout can also occur during WAIT
if the test freezes for whatever
reason i.e. infinite loops with --no-count
. Once the test has finished
executing the END
state is entered, any resources freed and Tarpaulin will go
to the next test to run or report the results.
Below is the state machine happy path assuming no errors and also no time waiting for signals to pop up.
+
|
|
+----v------+
| START |
+----+------+
|
|
+----v------+
| INIT |
+----+------+
|
|
+-----v------+
+------> WAIT |
| +-----+------+
| |
| |
| |
| +-----v------+
+------+ STOP |
+-----+------+
|
|
+------v------+
| END |
+-------------+
There are 4 conditions which lead directly to the test being stopped detailed below:
-
END
- always called this is the final cleanup -
TIMEOUT
- a timeout occurred meaning a test ended up hanging or the timeout arg to Tarpaulin is too short -
UNRECOVERABLE
- something went wrong during execution of this test specifically -
ABORT
- something fundamentally wrong occurred which means Tarpaulin won't work for any other tests it has to run.
Previously, the only example of ABORT
being used is if the test binaries
aren't Position Independent Executables. This means code addresses have a random
offset and this used to prevent Tarpaulin from placing breakpoints and will
affected all the test binaries. Now however, Tarpaulin finds the offset by
using procfs
to read it from /proc/$PID
.
Relevant files:
- src/breakpoint.rs
- src/ptrace_control.rs
- src/statemachine/mod.rs
- src/statemachine/linux.rs
- src/traces.rs
- src/processing_handling/linux.rs
So the previous section is a minor simplication of how Tarpaulin handles running a test binary. Tarpaulin (currently not released), can now follow exec events and trace launched binaries. This is useful for things like CLI tests where a test may launch one of the binary outputs of the project with different args and check the output.
This adds an extra layer of complexity to handling the exec events but can be summed up as so.
- A ptrace exec event occurs where the event data provides the new PID
- Using procfs find the path to the binary
- Call back into the test_loader module to get the
TraceMap
for the binary - Store the process PID mapping to the
TraceMap
So when we get the address we've stopped at we need to know which binary the
address is in. Because of this we now maintain a map from pid/tids to the
parent process and use them to look up the TraceMap
so the correct statistics
are being updated.
Currently following executables is an opt-in feature with the --follow-exec
flag as I've not seen enough projects to be happy that it will work for
everyone.
Relevant files:
- src/statemachine/linux.rs
There are numerous report formats supported by Tarpaulin, apart from the stdout
and html report outputs all of these are existing formats either requested by
users or implemented for interop with other services. After running all the
Tarpaulin configurations, every TraceMap
is merged and the resulting
coverage for the application is passed with the Config
and the selected
reports are generated.
The Tarpaulin output is also always written to a file in the target directory. This allows for the stdout reports to report changes in coverage (and in future the HTML reports). This is mainly for local usage and not CI where a user can use a free service like coveralls.io or codecov.io
Cobertura is an XML based format. Documentation for it's format is scarce online and it features a lot of redundant information so I host a DTD I found for it on a public gist found here
Due to it's use of ptrace you can't attach a debugger like GDB to Tarpaulin which complicates fault finding. To tackle this there is a general reliance on logging. There are two main forms of debug logging:
- stdout logging to the terminal
- the event logger
For large projects the stdout logging quickly becomes unusable and so the event
log output becomes invaluable. With this the EventLog
struct exists globally
and events are pushed into it as they happen. Then at the end of the run they
are serialized to json. Previously, this was then rendered to a static SVG
giving each PID/TID it's own vertical lane and edges connected them so you
could visualise all interactions along a timeline.
This ended up being too much to render for large projects crashing browser based SVG renderers and making ones like Inkscape sluggish. As a result I've implemented my own renderer using Qt which can be found here. It's not really intended for general use so the interface may undergo sudden changes and it's not documented but it should be relatively intuitive to load traces and navigate via mouse or keyboard.
Other times when changing some behaviour it would be helpful to figure out if this truly fixes an issue or causes other issues. For this I've started work on tater which is a crater style tool for Tarpaulin. I'm hoping this will get more use and prove helpful when dropping big releases.