DWARF Extensions for Unwinding Across Merged Functions

Cary CoutantUpdated July 23, 2009

Background

In large applications, and particularly in large C++ applications, it is common for several similar functions to produce identical object code, even though the source code for each function is slightly different. These identical functions could, in principle, be merged into a single copy, with each symbol set to the address of that one copy. For one particular large C++ application studied, the potential reduction in size from this optimization has been estimated at 6-7%.

One problem with an optimization such as this is that a PC value can no longer be unambiguously mapped to a particular function. This can be expected to cause mysterious behavior when profiling and debugging an application that has been built with this optimization.

The proposal presented here adds a new table to the DWARF debugging information to allow the debugger and other tools to disambiguate such PC values by examining the call chain.

Previous Work

Microsoft's Visual Studio product offers a compiler option, /Gy, that directs the compiler to place the object code for each function in a separate COMDAT section. This can be used with a linker option, /OPT:ICF ("Identical COMDAT Folding"), that directs the linker to detect duplicate instances of identical COMDAT sections and remove the redundant copies. All the original function symbols are then set to the address of the one remaining copy.

Microsoft advises its users not to use this option when profiling or debugging an application, as there is no support for disambiguating a PC value that may belong to any of several different functions that were folded into one copy.

Overview

When the PC is inside a merged function, we will attempt to disambiguate the function by examining the function's return pointer (i.e., the point of call that invoked the merged function). There are four cases:

Direct call: The point of call is a direct call to the merged function. In this case, we can look up the address of the call site in a direct-call table that provides, for each direct call site in the program, a pointer to the debug information entry for the called function.

Virtual call: The point of call is a C++ virtual function call. In this case, we can look up the address of the call site in a virtual-call table that provides, for each virtual call site in the program, the index of the vtable slot used for the call. Given the vtable address, which can be obtained from the this pointer in the callee, the debug information entry for the corresponding class can be identified, and the function being called can be determined by matching the slot index to the virtual function members of that class. If the callee's this pointer cannot be determined (e.g., due to optimization of the routine), the vtable slot index might still be used to eliminate some candidate functions.

Other indirect call: The point of call is an indirect call, but not a virtual function call. This case is less important than the first two, and we will defer support for it. One possible approach to mitigate this would be to simply flag any function whose address has been taken and prevent such functions from being merged at link time.

Tail call: The point of call may have led to an intermediate procedure that made a tail call to reach the current callee. In this case we will still not be able to disambiguate the PC.

This requires the construction of two additional debug information tables in the final executable: a direct-call table and a virtual-call table. Each of these will be generated by the compiler and placed in a new debug section in the relocatable object files. The linker will combine these sections as it normally does to produce a single combined section for each table.

The number of functions that can be merged is expected to be a small fraction of the total number of functions in any typical application. For functions that will not be merged together, entries in the direct call table are not needed, and would represent a significant waste of space. At compile time, unfortunately, all functions are candidates for merging and the duplicate functions have not yet been identified, so the compiler must generate call table entries for all direct and virtual calls.

The actual merging of identical functions takes place at link time. In order to reduce the total size of the direct call table, therefore, the linker can discard any call table entry that refers to a function that has not been merged with another.

Call Table Format

Typical DWARF sections consist of a header per compilation unit followed by data about that compilation unit. The header provides, at the minimum, an indication of the size of the data for the compilation unit so that, once many compilation units have been combined by the linker, a consumer can identify the separate compilation units. When the data may contain address-sized objects (as the call tables will), the header typically also provides the size of an address for the target machine. When the data may contain offsets to other DWARF information (as the call tables will), the header also provides an indication of whether these offsets are expressed as 32-bit or 64-bit offsets (see Section 7.4 of the DWARF spec). DWARF distinguishes between addresses and offsets because a 64-bit program, which would use 64-bit addresses, is rarely larger than the 4 GB limit that would require it to also use 64-bit offsets for debug information.

For each compilation unit, the compiler will place the direct call table in the section .debug_dcall, and the virtual call table in the section .debug_vcall. Each of these sections will begin with a compilation unit header with the following format:

unit_length: A 4-byte or 12-byte unsigned integer representing the length of the contribution for this compilation unit, not including the length field itself. In the 32-bit DWARF format, this is a 4-byte value (which must be less than 0xfffffff0); in the 64-bit DWARF format, this consists of the 4-byte value 0xffffffff followed by an 8-byte unsigned integer that gives the actual length.

version: A 1-byte unsigned integer indicating the version number of the format of the call table information. The version number for the format described here is 4.

debug_info (Direct call table only): A 4-byte or 8-byte unsigned integer representing the base of the portion of the .debug_info section for this compilation unit. In the 32-bit DWARF format, this value is 4 bytes; in the 64-bit DWARF format, this value is 8 bytes.

address_size: A 1-byte unsigned integer representing the size in bytes of an address on the target architecture.

The call table entries immediately follow.

Direct Call Table

Entries in the direct call table have the following format:

call_site: An address-sized value containing a pointer to the call site. Because this table will be consulted during a stack unwind using a return pointer, the value recorded here as the call site will actually be the address following the call instruction, to which the callee would return.

callee_die: An unsigned LEB128 value representing the location of a debug information entry in the .debug_info section, relative to the base of the .debug_info section. This entry represents a declaration (not necessarily a defining declaration) of the called subroutine, and should be either a DW_TAG_subprogram or DW_TAG_entry_point debug entry.

Virtual Call Table

Entries in the virtual call table have the following format:

call_site: An address-sized value containing a pointer to the call site. Because this table will be consulted during a stack unwind using a return pointer, the value recorded here as the call site will actually be the address following the call instruction, to which the callee would return.

vtable_slot: An unsigned LEB128 value containing the index of the vtable slot used for the virtual function call.

Linker Changes

The linker will combine all .debug_dcall contributions into a single section in the output file, and will do likewise with .debug_vcall contributions.

In order to reduce the final size of the direct call table, it may optionally process the contents of each contribution to the .debug_dcall section, removing all entries whose callees are known not to have been merged with another. This step is optional, because the analysis and understanding of a specific section is not a typical linker function, and may tend to hurt the linker's performance.

The linker may also optionally scan for functions whose address has been taken, during its normal scan of relocations, and mark those functions as ineligible for merging with other identical functions.

Debugger/Profiler Changes

A debugger, profiler, or other application that needs to disambiguate a PC value within a merged function will need to perform a (partial) stack trace to obtain the return pointer of the function containing the current PC. It can then scan the direct and virtual call tables to find an entry that corresponds to the return pointer.

If it finds an entry in the direct call table, it can use the callee_die field to locate the debug information entry for the function that was actually called.

If it finds an entry in the virtual call table, it can use the this pointer to identify the virtual table for the object whose member function was called. Since the DWARF information does not record any information directly about virtual tables, the application will need to find the symbol corresponding to the vtable address, demangle the symbol name to identify the actual type to which the vtable corresponds, then find the debug information for that type. It can then match the location expression from the vtable_elem_loc field against the DW_AT_vtable_elem_location attributes of the type's member function entries to identify the function that was actually called.

In some cases, the point of call may itself be in a merged function. For example, consider a function A that calls function B, and a function C that calls function D. Suppose functions B and D are identical and are thus merged, and that as a result, functions A and C are also identical and are merged. In this case, a PC that could be in either B or D will have a return pointer that could be in either A or C. We would need to recursively determine whether we are in A or C before we can decide whether we are in B or D.

Improved Identification of Vtables

Finding the vtable from the virtual call table as described above requires the demangling of C++ symbols, introducing some external dependencies on both the C++ language and its specific implementation for the target platform. To remove these dependencies, we could implement an additional DWARF extension to record virtual tables in the DWARF information. Each virtual table generated by the compiler would be represented by a debugging information entry with the tag DW_TAG_vtable (or DW_TAG_GNU_vtable if implemented as a GNU extension). The vtable entry would have a DW_AT_location attribute giving its location, and a DW_AT_type attribute referring to the class that it belongs to.