DYLD Detailed

Jonathan Levin, http://newosxbook.com/ - 8/12/13

1. About

While maintaining and adding more functionality to JTool, I found myself deeply bogged down in implementing support for Mach-O's LINKEDIT sections, LC_SYMTAB, and other arcane and relatively undocumented corners of DYLD. Add to that, DYLD has been relatively skimmed in my book *, and not much in that of my predecessor. Scouring the Internet with Google finds only one decent reference1, though it's woefully incomplete and basically just rehashes stuff from the book. Needless to say Apple makes no effort to provide documentation outside its "Mach-O Programming Topics"2 document, which is by now very dated. What better way, then, to right a wrong and shed some light on it, than an article?

Why should you care? (Target Audience)

I said so in the book, and I'll state it again - There is no knowledge that is not power, and in the case of linking - we're talking about a lot of power. Virtually every binary run in OS X or iOS is dynamically linked, and being able to intervene in the linking process bestows significant capabilities - function interception, auditing and hooking, being the most important ones. Reverse engineers, security-oriented developers (i.e. Anti-Malware) and hackers will hopefully find this information very useful.
It should be noted that dyld allows for hooking and interception via environment variables - most notably DYLD_INSERT_LIBRARIES (akin to ld's LD_PRELOAD) and DYLD_LIBRARY_PATH (like ld's LD_LIBRARY_PATH), and its function interposing mechanism. These are covered in the book (somewhere in Chapter 4, with a demo on this website3), and are therefore not discussed in this document.

Prerequisite: About Linking

Nearly all binaries, in UN*X and Windows systems alike, are dynamically linked. The benefits of dynamic linking are many, and include:

Code reuse: commonly used code can be extracted to a library, which is then shared by many clients

Easy updating: code residing in a library can easily be updated, and the library replaced, so long as the symbols are by and large the same. A classic example of this can be seen in Windows' "CreateWindow", which creates totally different-looking windows for the same application throughout Windows versions (think Win95 vs. XP vs. 7-8). The developer merely says "CreateWindow", not knowing how the window gets created. The OS does the rest, and different versions of the OS may do so differently.

Reducing disk usage: as commonly used code now has only one copy, as opposed to having to include the code in every single binary which uses it.

Reducing RAM usage: is by far, the most important advantage: A single copy of the library may be mmap(2)-ed into all processes, thereby only getting hit by the library's RAM usage once. The library code is usually marked r-x (read only, executable), so the same physical copy is implicitly shared by many consumers. This is crucial and saves immense amounts of memory, especially in RAM-challenged systems like Android.

UN*X, whose de-facto standard format is ELF, uses ld(1) as the program linker-loader, and the ".so" (shared object) files for libraries. OS X, thinking differently, uses ".dylib" (dynamic library) files. The standard nm(1) command is still supported, as are the dl* APIs (dlopen(3), dlsym(3), etc) - but the implementations are radically different (as is the nomenclature - what ld(1) calls "sections", DYLD calls "segments", and further divides into sections). DYLD's source code is open, but makes for a terrible read. DYLD offers many of the classic ld(1) functions, and then some.

Nomenclature

Throughout this article, the following terms are used:

dylib: A dynamic library. Akin to a UN*X shared object. A Mach-O object of type MH_DYLIB (0x6), loaded into other executables by the LC_LOAD_DYLIB (0xc) Mach-O command or the dlopen(3) API. For the record, it's worth noting that OS X also supports the concept of a fixed library (A Mach-o object of type MH_FVMLIB (0x3) loaded into other executables by the LC_LOADFVMLIB (0x6) command. Fixed libraries, however, are virtually extinct.

symbol: A variable or function in a Mach-O file which may or may not be visible outside that file.

binding: Connecting a symbol reference to its address in memory. Binding may be load-time, lazy (deferred) or (missing/overridable). These can be controlled at compile time: ld's -bind_at_load specifies load-time binding, and __attribute((weak_import)) for weak symbols. There is also an option to prebind libraries to fixed addresses (-prebind switch of ld)

Tools:

Apple provides otool(1), dyldinfo(1) and pagestuff(1) - if you have Xcode. If you don't, or - if you want to analyze Mach-O binaries on Linux - you are welcome to use JTool instead (http://www.newosxbook.com/files/jtool.tar). This is an all-in-one replacement for the above tools, with far more capable features, including an experimental disassembler. The tar file contains an OS X and iOS version bundled into one universal binary, as well as an ELF version (for Linux 64-bit). It's free to download and use, and will remain so.

In the outputs shown, I've color coded: white is what you should type. yellow is for my own annotations. Everything else is verbatim the output of the commands.

Calling external functions

If you disassemble any Mach-O dynamically linked binary, you will no doubt see, sooner or later, a call to an external function, supplied by some library (commonly, libSystem.B.dylib). These calls are implemented as calls to the Mach-O's symbol stub section. Consider the following example, from OS X's /bin/ls:

The book goes on (till page 121) to explain how DYLD manages the stubs, and populates them with the actual addresses of the functions, using dyld_stub_binder. It does not, however, explain HOW that's done. This is what we'll discuss here. But before we do, a bit about LINKEDIT:

DYLD_INFO and LINKEDIT

Starting with OS X 10.5 or 10.6, Apple decided to implement a special segment in Mach-O files for DYLD's usage. This segment, traditionally called __LINKEDIT, consists of information used by DYLD in the process of linking and binding symbols. This section is (for the most part) meaningful only to DYLD - the kernel is completely oblivious to its presence. DYLD relies on a special load command, DYLD_INFO, to serve as a "table of contents" for the segment. This can be seen with otool(1) or jtool:

Using jtool -v -l on a binary to display load commands, with a focus on the __LINKEDIT segment
Jtool contains a useful option, --pages, which presents a mapping of the Mach-O regions (segments, sections, and load command data), somewhat similar to (but more detailed than) pagestuff(1). This can be used, among other things, to dump the contents of __LINKEDIT:

Using jtool --pages on a sample binary
As can be seen from the above output, the general layout of the __LINKEDIT is as follows:

Indexed by LC_DYLD_INFO

Rebase Info

Image rebase info - contains rebasing opcodes

Bind Info

Image symbol binding info for required import symbols

Lazy Bind Info

Image symbol binding info for lazy import symbols. This will be 0 for binaries compiled with ld's -bind_at_load

Weak Bind Info

Image symbol binding info for weak import symbols

Export Info

Image symbol binding info for symbols exported by this image

Pointed to by LC_SEGMENT_SPLIT_INFO

Segment Split, if any

Segment split information

Pointed to by LC_FUNCTION_STARTS

Function start information

Function start point information (ULEB128)

Pointed to by LC_DATA_IN_CODE

Data regions in code

Data region information (ULEB128)

Pointed to by LC_CODE_SIGN_DRS

Code Signing DRs

Code signing DRs of dependent dylibs

Pointed to by LC_SYMTAB

Symbol Table

Table of symbols, in nlist format

Pointed to by LC_DYSYMTAB

Indirect Symbol Table

Table of indirect symbols

String Table

Array of symbol names

Pointed to by LC_CODE_SIGNATURE

Code Signature

Code Signing blob (discussed in a future article)

Layout of __LINKEDIT segment

DYLD makes extensive use of the ULEB128 encoding, which is (in the author's humble opinion) a crude and stingy encoding method. Low level implementors would be wide to familiarize themselves with the encoding, which is also used in DWARF and other binary-related formats.

DYLD OpCodes

DYLD uses a special encoding - consisting of various "opcodes" - to store and load symbol binding information. These opcodes are used to populate the rebase information and binding tables pointed to by the LC_DYLD_INFO command. There are two types of opcodes: Rebasing opcodes and Binding opcodes.

Binding opcodes

Binding opcodes (used for both lazy and non-lazy symbols) are defined in as BIND_xxx constants:

DONE

0x00

End of opcode list

SET_DYLIB_ORDINAL_IMM

0x10

Set dylib ordinal to immediate (lower 4-bits). Used for ordinal numbers from 0-15

SET_DYLIB_ORDINAL_ULEB

0x20

Set dylib ordinal to following ULEB128 encoding. Used for ordinal numbers from 16+

SET_DYLIB_SPECIAL_IMM

0x30

Set dylib ordinal, with 0 or negative number as immediate. the value is sign extended. Currently known values are:

BIND_SPECIAL_DYLIB_SELF (0)

BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE(-1)

BIND_SEPCIAL_DYLIB_FLAT_LOOKUP(-2)

SET_SYMBOL_TRAILING_FLAGS_IMM

0x40

Set the following symbol (NULL-terminated char[]). The flags (in the immediate value) can be either BIND_SYMBOL_FLAGS_WEAK_IMPORT(0) or BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION(8).

SET_TYPE_IMM

0x50

Set the type to immediate (lower 4-bits). Known types are:

TYPE_POINTER (most common)

TYPE_TEXT_ABSOLUTE32

TYPE_TEXT_PCREL32

SET_ADDEND_SLEG

0x60

Set the addend field to the following SLEB128 encoding.

SET_SEGMENT_AND_OFFSET_ULEB

0x70

Set Segment to immediate value, and address to the following SLEB128 encoding

ADD_ADDR_ULEB

0x80

Set the address field to the following SLEB128 encoding.

DO_BIND

0x90

Perform binding of current table row

DO_BIND_ADD_ADDR_ULEB

0xA0

Perform binding, also add following ULEB128 as address

DO_BIND_ADD_ADDR_IMM_SCALED

0xB0

Perform binding, also add immediate (lower 4-bits) using scaling

DO_BIND_ADD_ADDR_ULEB_TIMES_SKIPPING_ULEB

0xC0

Perform binding for several symbols (as following ULEB128), and skip several bytes (as the ULEB128 which follows next). Rare.

dyld 625 (Darwin 18 = MacOS 14/iOS 12) now binds everything non lazily (preload) on ARM64e. This adds opcode 0xd0, which appears in the beginning of the stream and defines the size of the table (as a ULEB). When 0xd0 is next encountered it changes its meaning to SUBCODE_THREADED_APPLY, after SET_SEGMENT_AND_OFFSET_ULEB. Note that "threaded" does not mean multithreaded (although I would have done it differently). This is explained in Vol I/7 (v1.1 and later, Sep 2018+), and jtool2 knows how to parse such opcode streams.

Each opcode is specified in the topmost 4-bits (e.g. BIND_OPCODE_MASK (0xF0) in . Arugments to opcodes are either the "immediate" values in the lower 4-bits (for those with _IMM), or follow the opcode byte in ULEB128 notation for integers, or a character array (SET_SYMBOL_TRAILING_FLAGS_IMM).

The opcodes populate the individual columns of row entries in the binding tables, with each row terminated by a DO_BIND. Each row carries by default the values of the previous row, and so an opcode is specified only if the column value is changed in between two symbols. This allows for table compression. The tables are a little bit different between the binding symbols (bind info) and the lazy binding symbols (lazy_bind info):

The symbol table itself is an array of nsyms entries, each a struct nlist or struct nlist_64 - depending on the file type (MH_MAGIC or MH_MAGIC_64, respectively). The nlist structures follow the BSD format, with some minor modifications. The String Table is nothing more than an array of NULL-terminated strings, which follow one another

The Indirect Symbol Table (LC_DYSYMTAB)

The Indirect Symbol Table in a Mach-O file is described in an LC_DYSYMTAB command. This command details (among other things) the offset of this table, and the number of symbols it contains. This can be seen with otool (or jtool) -l, as follows:

The indirect symbol table is, in fact, nothing more than an array of indices into the main symbol table (the one pointed to by LC_SYMTAB). Dumping the indirect symbol table is straightforward with jtool, by specifying an offset (or address) inside the table:

The indirect symbol table is used with two specific Mach-O sections - the __DATA.__nl_symbol_ptr, and __DATA.__lazy_symbol. We discuss these next.

__DATA.__nl_symbol_ptr and __DATA.__lazy_symbol

The __DATA.__nl_symbol_ptr section contains the "non-lazy" symbol pointers. Recall, that binding of symbols can be performed either at load time, or on first use. The "non lazy" pointers are those which must be bound at load time (that is, if binding is unsuccessful, the binary will fail to load). The name of the section is somewhat of a convention, but it is the section type (0x06 - S_NON_LAZY_SYMBOL_POINTERS) which defines its contents. As for the section contents, they are detailed in <mach-o/loader.h> as follows:

/*
* For the two types of symbol pointers sections and the symbol stubs section
* they have indirect symbol table entries. For each of the entries in the
* section the indirect symbol table entries, in corresponding order in the
* indirect symbol table, start at the index stored in the reserved1 field
* of the section structure. Since the indirect symbol table entries
* correspond to the entries in the section the number of indirect symbol table
* entries is inferred from the size of the section divided by the size of the
* entries in the section. For symbol pointers sections the size of the entries
* in the section is 4 bytes and for symbol stubs sections the byte size of the
* stubs is stored in the reserved2 field of the section structure.
*/
#define S_NON_LAZY_SYMBOL_POINTERS 0x6 /* section with only non-lazy
symbol pointers */
#define S_LAZY_SYMBOL_POINTERS 0x7 /* section with only lazy symbol
pointers */
#define S_SYMBOL_STUBS 0x8 /* section with only symbol
stubs, byte size of stub in
the reserved2 field */

It is worth mentioning that __nl_symbol_ptr is not the only "non-lazy" section: The binary's Global Offset Table (GOT) is in its own section, __DATA.__GOT, similarly marked with S_NON_LAZY_SYMBOL_POINTERS. It's also noteworthy that only one of these values is held in the section's flags field (which erroneously implies these are bit-flags - they are not, but there are some higher bit flags which may be or'ed with these values).
The __DATA.__lazy_symbol section contains lazy symbols. These are symbols which will be bound on first use. The code to do so is in an additional section, referred to as the symbol stubs. The "stubs" consist of boilerplate code, which is naturally architecture dependent. Apple Developer's "OS X Assembler Reference"4 details this well, but unfortunately only for the deprecated PowerPC architecture. JTool's disassembler is almost fully functional for ARM (but still very partial for x86_64). We therefore show the ARMv7 (iOS) case next.

dyld_stub_binder and _helper (in iOS)

Stub resolution in iOS and OS X is practically the same. The __TEXT.__stub_helper contains a single function, which sets up a call to the dyld_stub_binder according to the value pointed to by R12, a.k.a the Intra-Procedural register***. The other entries in stub_helper are trampolines to this function, each setting up R12 to hold the value of the indirect symbol table entry corresponding to the function to be bound. This is shown in the annotated jtool disassembly of ScreenShotr (the screen capture utility used by Xcode, from iOS's DeveloperDiskImage.dmg), below:

Jtool's disassembly is corroborated by DYLD's source, which surprisingly enough contains an #if __arm__ statement for iOS 5 which Apple has not removed. If you're following with x86_64 (e.g. with /bin/ls), the 0x100004040 from the lldb example is the trampoline to dyld_stub_binder. In other words, the code will look something like this when you break on 0x100004040:

Hopefully, this fills in the missing pieces, showing you not just what symbols are bound, but HOW they are bound. I hope to provide more information about LINKEDIT (specifically, the juicy parts of codesigning. You are always welcome to go online at the Book Forum and comment, ask questions, etc.

Footnotes

* (something I heard several times already by now as a criticism is a "lack of detail" - considering that Wiley restricted the book originally to 500 pages, I'm very lucky to have been able to extend it to the 800 pages it is - but some things just had to be left out, folks.. which is why I'm providing lots of extra content on the website..)

** - While we're on the subject, there's a typo in page 116 (should be "using Xcode's dyldinfo(1) or nm(1). One of the all too many omissions and editorial mistakes inserted, ironically, by the copy editor. Incidentally, nm(1) only shows the symbols, not where they are located. You might want to try jtool's -S feature (cloning nm(1)) with -v.

*** - This is a register which the ARM ABI allows for use in between functions/procedures.