In previousarticles, I've discussed the layout of various runtime data structures in Swift and alluded to a memory dumper that I was using to extract that data. Today, I'm going to walk through the implementation of that dumper.

CodeAs is traditional, the full code for the memory dumper can be found on GitHub:

You can take a look at it there to more easily follow along, run it yourself, or just ignore it.

Note that this code should not be considered a good example of style, implementation, or much of anything. Swift is still new to all of us and it certainly shows in my code. It is useful to see how certain things can be done, at least.

PointersWe're going to be doing a lot of work with pointers. Swift supports raw pointers fairly well, but not quite to the extent that's needed here. This code really wants to treat pointers like plain integers that happen to represent an address. To make that easier, the Pointerstruct contains an address as a UInt, and some utility methods for working with them:

structPointer:Hashable,Printable{

It's Hashable so it can be used in a Dictionary, and Printable is convenient for debugging. It contains one variable, the pointer address:

letaddress:UInt

The implementation of Hashable just returns the address converted to an Int. It doesn't care about preserving values or detecting overflows, and just wants to sling the bits across. The builtin function unsafeBitCast does exactly that:

varhashValue:Int{returnunsafeBitCast(address,Int.self)}

For Printable, NSString's format: initializer makes it easy to create a human-readable representation of the address:

The dladdr function takes a pointer and returns information about the corresponding symbol. Specifically, it returns the path name of the binary that contains it, the base address of that binary, the name of the symbol, and the starting address of that symbol. This information will come in handy for other functions, but dladdr is a bit of a pain to call, so a wrapper will prove handy. It returns an optional value since the call can fail:

funcsymbolInfo()->Dl_info?{

It starts by creating a Dl_infostruct. In C, we'd just declare it and let it sit uninitialized, but Swift requires an initial value, so this code just creates an empty one:

dladdr returns zero for failure and any other value for success. This determines whether the Dl_infostruct is returned, or nil:

return(result==0?nil:info)}

The symbol name is really useful information to display as part of a memory dump. The Dl_infostruct contains the symbol name, but there are two problems with using it directly. First, it's a C string, so it has to be converted to a nicer form in order to use it. Second, dladdr looks up the nearest symbol that comes before the specified address, while we only want the the symbol name for an address if it exactly matches the symbol address, not if it's offset. This symbolName function takes care of these:

funcsymbolName()->String?{

It's possible for symbolInfo to fail, so the call needs to be checked:

ifletinfo=symbolInfo(){

The returned symbol address is an UnsafePointer, but we want to compare it with address and only return the symbol name if they're equal. Another unsafeBitCast call solves the problem:

letsymbolAddress:UInt=unsafeBitCast(info.dli_saddr,UInt.self)

If the symbol address equals the pointer address, return the symbol name:

ifsymbolAddress==address{returnString.fromCString(info.dli_sname)}

If they don't match, or if dladdr failed altogether, return nil:

}returnnil}

Another useful function return the pointer to the next symbol following the current pointer. Symbols don't encode lengths, just locations, but looking up the location of the following symbol gives a reasonable ending point for guessing a length. The memory dumping code will use this information to figure out how much memory to read. We don't want to run off into hyperspace if something goes wrong, so this function also takes a limit for how far to search for the next symbol. It returns an optional Pointer, with nil indicating that no following symbol was found:

funcnextSymbol(limit:Int)->Pointer?{

As before, the call to symbolInfo can fail and must be checked:

ifletmyInfo=symbolInfo(){

The search strategy is to iterate byte by byte, calling symbolInfo each time. If the returned symbol base address changes, it's a new symbol. If it hits the limit without finding a new symbol, return nil. To start, it loops from 1 to the limit:

foriin1..<limit{

Generate a candidate pointer by adding i to self and get its symbol info:

letcandidate=self+iletcandidateInfo=candidate.symbolInfo()

If symbolInfo fails, the search has failed, return nil:

ifcandidateInfo==nil{returnnil}

If the returned address is different from the current symbol, the such has succeeded, return the candidate:

ifmyInfo.dli_saddr!=candidateInfo!.dli_saddr{returncandidate}

If the loop terminates or the original symbolInfo call fals, return nil:

}}returnnil}}

Hashable includes Equatable which means that Pointer needs an implementation of the == operator:

func==(a:Pointer,b:Pointer)->Bool{returna.address==b.address}

For convenience, Pointer also gets implementations of the + and - operators:

MemoryWe're also going to be doing a lot of work with memory contents. Fundamentally, a chunk of memory is just an array of bytes, but we want to store a bit of info about what kind of memory it is, and we want some functions that help with reading and scanning memory. The Memorystruct stores an array of bytes as well as two flags that specify whether the memory was allocated with malloc and whether it corresponds to a symbol:

structMemory{letbuffer:[UInt8]letisMalloc:BoolletisSymbol:Bool

These two flags can't really both be true simultaneously, so one could argue that this ought to be a three-case enum instead. I thought the two flags were a bit more natural to work with, though.

How do you get a chunk of memory? The fundamental operation is to take a Pointer and read the memory it points to into an array:

staticfuncreadIntoArray(ptr:Pointer,var_buffer:[UInt8])->Bool{

The natural way to implement this in Objective-C would be to cast the pointer to a void * and then call memcpy. In fact, you can do pretty much the same thing in Swift. The withUnsafeBufferPointer method on Array lets you get a pointer to the target buffer's storage, and memcpy is callable from Swift. The problem with this approach, in either language, is that it will crash if the pointer is bad or if the amount being read is too long.

The solution is to read the memory with the mach virtual memory calls. These calls ask the kernel to read the memory on your behalf, and it has all the information it needs to perform the read safely and fail gracefully. Specifically, the mach_vm_read_overwrite call will read memory from a pointer into a buffer, and return an error code if the memory isn't readable. This is the approach we use in PLCrashReporter to read data when walking data structures which may have been corrupted in a crash. It works great here.

In order to read into buffer, we need to get a pointer to its storage. The withUnsafeBufferPointer takes care of that:

withUnsafeBufferPointer doesn't return the pointer. Instead, it calls a function and passes the pointer as a parameter. It returns whatever value the function returns. We'll return the result code from mach_vm_read_overwrite, thus the kern_return_t return type.

mach_vm_read_overwrite takes the pointer to read as a 64-bit unsigned integer, so we have to convert the address of ptr:

letptr64=UInt64(ptr.address)

We also need the target pointer as a 64-bit unsigned integer. The unsafeBitCast function takes care of getting it into an integer, and then that can be converted to a UInt64:

The function also returns the amount of data read using an out parameter. This value isn't useful to us (as far as I can tell, it's always the amount requested if the call succeeds) but we still have to pass in a pointer for it to write to, so we need a local variable for it:

Outside of the closure, result now contains the result code returned by mach_vm_read_overwrite. If it returned KERN_SUCCESS, buffer is now filled with contents of the target memory. We'll boil down the result code to a simple true/false for the caller:

returnresult==KERN_SUCCESS}

Next up, we need a way to take a Pointer and turn it into a Memory instance by reading the contents of that pointer. readIntoArray forms the foundation of this process, but it requires a size, whereas we usually won't know the size of an arbitrary Pointer. The read function takes a Pointer and an optional known size and returns an optional Memory:

staticfuncread(ptr:Pointer,knownSize:Int?=nil)->Memory?{

The first step is to try to guess the size of the pointed-to memory. Since we're chasing arbitrary pointers, we can't always figure this out reliably. We'll start by calling malloc_size. This requires converting the address of the Pointer to an UnsafePointer using our good friend unsafeBitCast:

malloc_size helpfully returns zero if the memory wasn't allocated with malloc. (This is not guaranteed in the documentation, so please don't write production code that relies on this fact.) Thus, we can populate isMalloc by checking `length:

letisMalloc=length>0

We'll populate isSymbol by checking to see if there's a symbol name for the pointer:

letisSymbol=ptr.symbolName()!=nil

If it's a symbol, then we'll try to guess the length by looking at the distance from that symbol to the following symbol:

The memory dumper works recursively. It reads a pointer into a buffer, extracts pointers from that buffer, then reads those pointers into buffers and continues in that fashion. Extracting pointers from a buffer is a fundamental part of that process. It's difficult to know exactly what parts of a buffer are pointers. For the purposes of the dumper, it will assume that every naturally aligned pointer-sized quantity is a pointer. There's little harm in guessing wrong, since the memory reader tolerates bad pointers. The scanPointers function takes no parameters (since it operates on a the internal buffer of a Memory instance) and returns an array of PointerAndOffset instances. This is a simple struct that contains one Pointer and one offset as an Int. The offset is useful elsewhere when printing the results, since it can show exactly where a pointer was found. Here's the function declaration:

funcscanPointers()->[PointerAndOffset]{

Results are accumulated in an array:

varpointers=[PointerAndOffset]()

The contents of the Memory instance are in buffer which is an array of UInt8. We need to read pointer-sized chunks of this. One way would be to read several elements at a time and do some bitshifting to construct a pointer. Or, since we're already slinging "unsafe" stuff around with reckless abandon, we could just convert it to a UInt pointer and read the data out directly:

With the array filled out, all that remains is to return it to the caller:

returnpointers}

A lot of memory chunks contain strings, and it's useful to scan for strings and print them out in a human-readable fashion. It's impossible to know for sure whether a chunk of memory actually contains a string or just contains binary data that happens to look like a string, but with some heuristics it's possible to do a decent job. I chose to treat any sequence of at least four consecutive bytes in the range of 32-126 inclusive as a string. This range is the range of ASCII characters excluding unprintable control characters. Similar to scanPointers, the scanStrings function takes no parameters and returns an array of String:

funcscanStrings()->[String]{

First, make constants for the upper and lower bound:

letlowerBound:UInt8=32letupperBound:UInt8=126

The current candidate sequence is stored in a local array, as are the strings accumulated so far:

varcurrent=[UInt8]()varstrings=[String]()

Now, loop through the buffer. The program tacks a zero byte on the end of the buffer when looping through it to ensure that every candidate sequence ends with a byte that's outside the bounds. This avoids the need for a final check of current after the loop ends:

forbyteinbuffer+[0]{

If the byte is within the bounds, tack it on to current:

ifbyte>=lowerBound&&byte<=upperBound{current.append(byte)

Otherwise, if current contains at least four bytes, turn it into a String and add it to strings:

There's probably a better way to create a String from an array of UInt8, but this works well enough. Finally, clear current for the next round:

current.removeAll()}}

Once all is done, return the strings:

returnstrings}

It's also nice to show a raw hexadecimal representation of the memory contents. The hex function handles this:

funchex()->String{

We want spaces every eight bytes:

letspacesInterval=8

The output is accumulated in an NSMutableString. The ability to use format strings when appending makes it easier to deal with hexademical:

letstr=NSMutableString(capacity:buffer.count*2)

Iterate over the buffer. Use enumerate to get both the index and the byte value:

for(index,byte)inenumerate(buffer){

Every spacesInterval bytes, add a space:

ifindex>0&&(index%spacesInterval)==0{str.appendString(" ")}

Add the current byte as hexadecimal:

str.appendFormat("%02x",byte)}

When it's all done, return the accumulated string:

returnstr}}

For completeness, here's the PointerAndOffsetstruct used above:

structPointerAndOffset{letpointer:Pointerletoffset:Int}

PrintingA lot of the rest of the code involves printing results. A memory dumper isn't very useful unless it shows you what it finds. To make it easier to print results in a useful way, I built a Printer protocol that the other code uses, along with a set of utility functions. The Printer protocol can be implemented to dump output in different forms. Here, I'll show the terminal printer implementation. I also created an implementation that outputs HTML, which you can see on GitHub.

Color is a useful way to show relationships between different printed items. An enum defines available colors for printing:

The Printer protocol defines the capabilities needed for a printer object. It's not extensive: it allows for printing a string with a color, printing a string with the default color, printing a newline, and terminating output (necessary for closing tags when writing HTML):

The full escape sequence for a color consists of the escape character (ASCII code 27), a [ character, the numeric color code, and then a m character. A printEscape utility function captures the process of outputting a PrintColor to the terminal as the appropriate escape sequence:

The single-argument version of the method just calls the two-argument version with .Default:

funcprint(str:String){print(.Default,str)}

println just calls through to the built-in function:

funcprintln(){Swift.println()}

Finally, the end() method is empty, since there's nothing that needs to be done to wrap up printing to the terminal:

funcend(){}}

A couple of convenience functions help with making nicely-formatted output. This pad function pads a string to align it to the left or right if it's shorter than requested. It's not all that interesting, so I won't go into details:

The struct just wraps a Pointer since all other data can be retrieved from the Objective-C runtime using that pointer:

letaddress:Pointer

A computed property makes it convenient to retrieve address as an AnyClass, which is the type that the Objective-C runtime functions want to see. Our good friend unsafeBitCast makes yet another appearance:

There are a few bits of code that want to retrieve a class's name, and a computed property makes that easy:

varname:String{returnString.fromCString(class_getName(classPtr))!}

Finally, we want classes to be able to dump themselves to a Printer:

funcdump(p:Printer){

When working with Objective-C runtime functions, there's a really common pattern where the function returns a pointer to an array that's terminated by NULL, and you're required to free the array when you're done using it. In Swift, the pointers are represented as UnsafeMutablePointer<COpaquePointer>, so one convenient function can wrap up the annoying work:

An ObjCClass is created from the Pointer and added to the result array:

result.append(ObjCClass(address:address))}

The result array is then returned to the caller:

returnresult}

Scanning Data StructuresWe're ready to start looking at the actual scanning machinery now. Each memory address to be scanned is wrapped up in a ScanEntry instance. This holds a parent entry that indicates where the pointer was found, an offset within the parent, the scanned address, and an index. The index is used to assign each entry a number to make it easier to cross-reference them in the output. This is a class rather than a struct because multiple data structures need to refer to the same instance, and potentially mutate it or see mutations. Here's the definition:

Actually performing a scan on a ScanEntry produces a ScanResult. A ScanResult points to an entry and a parent. It also contains a Memory that represents its contents, an array of child results, an indentation level, and a print color:

It's handy to get a name for a ScanResult, but it's not quite as easy as just looking it up:

varname:String{

If this entry happens to refer to an Objective-C class, then we can ask that class for its name:

ifletc=ObjCClass.atAddress(entry.address){returnc.name}

If the entry refers to an Objective-C object then the first pointer-sized chunk of the memory will be an isa that refers to the object's class. At least on architectures and OSes that don't use a non-pointer isa. Memory's scanPointers method makes it easy albeit inefficient to grab the first pointer. If the first pointer exists (i.e. the memory is at least long enough to contain one) and it points to an Objective-C class, we fake up a -description style name and return that:

If the entry has a parent, it prints the parent's address and this entry's offset within it, all in the parent's color to make it easier to visually cross-reference. Otherwise, it prints the fact that this is the root pointer:

To dump the entire tree, we track an array of pending entries. We remove an entry from the array and examine it. If it has children, we add those children to the array. We keep doing this until we run out of array:

varchain=[self]whilechain.count>0{

The result to scan is popped off the end of the array:

letresult=chain.removeLast()

Results with children get assigned a color:

ifresult.children.count>0{result.color=nextColor()}

The result is indented and then dumped:

foriin0..<result.indent{p.print(" ")}result.dump(p)

Children are then added to the array. Their indentation is also set at this time:

The reverse() swaps the order in which children are printed, causing the first child to be printed first. The fact that entries are added to and removed from the end also changes how things are printed, making it a depth-first print rather than a breadth-first print. These can be changed around to change how the dump output is organized.

ScanningWe've finally reached the last piece of the puzzle. The scanmem function takes an arbitrary value and returns a ScanResult representing that value. It also takes a limit of how many entries to scan before returning. It can produce a lot of output otherwise as it ends up scanning the whole Objective-C class tree and everything it points to. Limiting it keeps it from jumping off into the weeds and helps to ensure that the output is relevant to what we want to view.

The function is written using generics to ensure it works on the exact type of value that's passed in by the caller and to avoid any boxing or wrapping as might happen with Any:

funcscanmem<T>(varx:T,limit:Int)->ScanResult{

The number of entries seen so far is kept in count:

varcount=0

To avoid infinite loops, entries that have already been seen are tracked. A Dictionary mapping to Void makes for a handy set type:

varseen=Dictionary<Pointer,Void>()

Entries pending to be scanned are held in an array:

vartoScan=Array<ScanEntry>()

Results are held in a Dictionary keyed on their Pointer so that children can be easily matched with their parents:

varresults=Dictionary<Pointer,ScanResult>()

In order to dump x, we need a pointer to it. The withUnsafePointer function takes a value and provides a pointer to it. We'll take that pointer and then do all the dirty work inside, finally returning the root ScanResult:

returnwithUnsafePointer(&x){(ptr:UnsafePointer<T>)->ScanResultin

Our friend unsafeBitCast handles the conversion of ptr to a UInt that can be used to create a Pointer:

letfirstAddr:Pointer=Pointer(address:unsafeBitCast(ptr,UInt.self))

The ScanEntry for this first address has no parent, no offset, and an index of zero:

The scan loop consists of repeatedly pulling en entry, scanning it, and adding child entries to the toScan array until either the scan limit is reached or it runs out of stuff to scan:

whiletoScan.count>0&&count<limit{

Pull the entry to scan off the end of the array:

letentry=toScan.removeLast()

Set the index of the entry from count:

entry.index=count

Read the underlying memory at the ScanEntry's address. In the special case where count is zero and we know that we're reading x, we can pass a known size in to the function by using sizeof to get the size of T. Otherwise, we'll pass nil and let Memory.read try to figure out the size on its own:

Insert the new entry at the beginning of toScan. This could also be added at the end, which would make this a depth-first scan rather than a breadth-first scan. I found breadth-first to be more useful for exploration:

toScan.insert(newEntry,atIndex:0)}}}}

And that's about it! All that remains is to return the root ScanResult. We grab that by looking it up in results:

returnresults[firstAddr]!}}

UsageTo use this function, create a Printer, call scanmem with a value and a limit, then call recursiveDump on the result and end the Printer:

ConclusionThis is far from normal or sane Swift code, but it works and the results are really useful. It's also a great example of how Swift lets you interact with all sorts of low-level C calls without much more of a fuss than it takes to call them from C. Although you should probably avoid these shenanigans when you can, the fact that you can do stuff like unsafeBitCast and get pointers to the internal storage of arrays is really handy when you need it.

This is far from normal or sane Swift code, but it works and the results are really useful. It’s also a great example of how Swift lets you interact with all sorts of low-level C calls without much more of a fuss than it takes to call them from C. Although you should probably avoid these shenanigans when you can, the fact that you can do stuff like unsafeBitCast and get pointers to the internal storage of arrays is really handy when you need it.