This document is aimed at those who use LLVM’s code coverage mapping to provide
code coverage analysis for their own programs, and for those who would like
to know how it works under the hood. A prior knowledge of how Clang’s profile
guided optimization works is useful, but not required.

We start by showing how to use LLVM and Clang for code coverage analysis,
then we briefly describe LLVM’s code coverage mapping format and the
way that Clang and LLVM’s code coverage tool work with this format. After
the basics are down, more advanced features of the coverage mapping format
are discussed - such as the data structures, LLVM IR representation and
the binary encoding.

LLVM’s code coverage mapping format is designed to be a self contained
data format, that can be embedded into the LLVM IR and object files.
It’s described in this document as a mapping format because its goal is
to store the data that is required for a code coverage tool to map between
the specific source ranges in a file and the execution counts obtained
after running the instrumented version of the program.

The mapping data is used in two places in the code coverage process:

When clang compiles a source file with -fcoverage-mapping, it
generates the mapping information that describes the mapping between the
source ranges and the profiling instrumentation counters.
This information gets embedded into the LLVM IR and conveniently
ends up in the final executable file when the program is linked.

It is also used by llvm-cov - the mapping information is extracted from an
object file and is used to associate the execution counts (the values of the
profile instrumentation counters), and the source ranges in a file.
After that, the tool is able to generate various code coverage reports
for the program.

The coverage mapping format aims to be a “universal format” that would be
suitable for usage by any frontend, and not just by Clang. It also aims to
provide the frontend the possibility of generating the minimal coverage mapping
data in order to reduce the size of the IR and object files - for example,
instead of emitting mapping information for each statement in a function, the
frontend is allowed to group the statements with the same execution count into
regions of code, and emit the mapping information only for those regions.

The remainder of this guide is meant to give you insight into the way the
coverage mapping format works.

The coverage mapping format operates on a per-function level as the
profile instrumentation counters are associated with a specific function.
For each function that requires code coverage, the frontend has to create
coverage mapping data that can map between the source code ranges and
the profile instrumentation counters for that function.

The function’s coverage mapping data contains an array of mapping regions.
A mapping region stores the source code range that is covered by this region,
the file id, the coverage mapping counter and
the region’s kind.
There are several kinds of mapping regions:

Code regions associate portions of source code and coverage mapping
counters. They make up the majority of the mapping regions. They are used
by the code coverage tool to compute the execution counts for lines,
highlight the regions of code that were never executed, and to obtain
the various code coverage statistics for a function.
For example:

Skipped regions are used to represent source ranges that were skipped
by Clang’s preprocessor. They don’t associate with
coverage mapping counters, as the frontend knows that they are never
executed. They are used by the code coverage tool to mark the skipped lines
inside a function as non-code lines that don’t have execution counts.
For example:

Expansion regions are used to represent Clang’s macro expansions. They
have an additional property - expanded file id. This property can be
used by the code coverage tool to find the mapping regions that are created
as a result of this macro expansion, by checking if their file id matches the
expanded file id. They don’t associate with coverage mapping counters,
as the code coverage tool can determine the execution count for this region
by looking up the execution count of the first region with a corresponding
file id.
For example:

The file id an integer value that tells us
in which source file or macro expansion is this region located.
It enables Clang to produce mapping information for the code
defined inside macros, like this example demonstrates:

A coverage mapping counter can represents a reference to the profile
instrumentation counter. The execution count for a region with such counter
is determined by looking up the value of the corresponding profile
instrumentation counter.

It can also represent a binary arithmetical expression that operates on
coverage mapping counters or other expressions.
The execution count for a region with an expression counter is determined by
evaluating the expression’s arguments and then adding them together or
subtracting them from one another.
In the example below, a subtraction expression is used to compute the execution
count for the compound statement that follows the else keyword:

Finally, a coverage mapping counter can also represent an execution count of
of zero. The zero counter is used to provide coverage mapping for
unreachable statements and expressions, like in the example below:

The zero counters allow the code coverage tool to display proper line execution
counts for the unreachable lines and highlight the unreachable code.
Without them, the tool would think that those lines and regions were still
executed, as it doesn’t possess the frontend’s knowledge.

The string contains values that are encoded in the LEB128 format, which is
used throughout for storing integers. It also contains a string value.

The length of the substring that contains the encoded translation unit
filenames is the value of the second field in the __llvm_coverage_mapping
structure, which is 20, thus the filenames are encoded in this string:

c"\01\12/Users/alex/test.c"

This string contains the following data:

Its first byte has a value of 0x01. It stores the number of filenames
contained in this string.

Its second byte stores the length of the first filename in this string.

The remaining 18 bytes are used to store the first filename.

The length of the substring that contains the encoded coverage mapping data
for the first function is the value of the third field in the first
structure in an array of function records stored in the
third field of the __llvm_coverage_mapping structure, which is the 9.
Therefore, the coverage mapping for the first function record is encoded
in this string:

c"\01\00\00\01\01\01\0C\02\02"

This string consists of the following bytes:

0x01

The number of file ids used by this function. There is only one file id used by the mapping data in this function.

0x00

An index into the filenames array which corresponds to the file “/Users/alex/test.c”.

0x00

The number of counter expressions used by this function. This function doesn’t use any expressions.

0x01

The number of mapping regions that are stored in an array for the function’s file id #0.

0x01

The coverage mapping counter for the first region in this function. The value of 1 tells us that it’s a coverage
mapping counter that is a reference to the profile instrumentation counter with an index of 0.

0x01

The starting line of the first mapping region in this function.

0x0C

The starting column of the first mapping region in this function.

0x02

The ending line of the first mapping region in this function.

0x02

The ending column of the first mapping region in this function.

The length of the substring that contains the encoded coverage mapping data
for the second function record is also 9. It’s structured like the mapping data
for the first function record.

The two trailing bytes are zeroes and are used to pad the coverage mapping
data to give it the 8 byte alignment.

The value of the counter’s tag distinguishes between the counters and
pseudo-counters — if the tag is zero, than this header contains a
pseudo-counter, otherwise this header contains an ordinary counter.

deltaLineStart: The difference between the starting line of the
current mapping region and the starting line of the previous mapping region.

If the current mapping region is the first region in the current
sub-array, then it stores the starting line of that region.

columnStart: The starting column of the mapping region.

numLines: The difference between the ending line and the starting line
of the current mapping region.

columnEnd: The ending column of the mapping region. If the high bit is set,
the current mapping region is a gap area. A count for a gap area is only used
as the line execution count if there are no other regions on a line.