Plists

We’re going to kick this off with a bit about Plist files. I was recently asked about the internal structure of Plist files and wasn’t happy with my answer, so I needed to know more. Below is what I found out. Ironically, the same person that asked me about Plist files is the one that told me to stop blogging, and in doing so created this blog-a-day challenge.

Plist files are found sprinkled throughout OS X and iOS and contain the various configuration settings and other information of use to the OS and applications. They are one of the features that was inherited from NeXTSTEP when it became the new core of Mac OS (along with application bundles and the Mail app). Plists are key/value pairs that are stored in either text or binary. The values can be one of the following data types:

Type

Used for

string

ASCII or Unicode strings

data

Binary data

date

A Date, seconds since 2001-01-01T00:00:00Z

integer

A whole number

real

A number with a decimal point

boolean

True or False

array

Array of any of these types

dict

Dictionary, array of key/value pairs where value is any of these types

But, XML is not a very efficient way to save data on disk (too much overhead created by all those < and >). So, Apple introduced a binary format. When opened in something that doesn't like it, it will look like this:

Apple does include a utility to convert the binary files to XML that is native to the OS in v10.2+ and is available on Windows as part of the iTunes install (\Program Files (x86)\Common Files\Apple\Apple Application Support\plutil.exe). With it you can convert the file in place. Keep in mind that this converts the file in place meaning that the original binary gets overwritten by the new XML version. If you don't want this to happen (cause, eh, forensics), use the "-o path" option to specify a different output file name/location. Using the "-p" switch produces output that is "easy" for humans to read that looks like this:

In forensics, we like to know how our tools are getting the data they present us. So, how do WE read that binary file? Unfortunately, I had to go to source code to find the answer. There is a comment in this source code file that explains the structure of these files.

First there is an 8 byte header that provides a signature and version number. The signature is always "bplist". So far, the version is "00", but this can change in the future.

Next is a series of variable sized objects, with each object having a 1 byte header that provides an object type and length in bytes.

Last is a trailer containing a series numbers that consume 8-bytes each that provide us some tips for reading the plist.

To help us fully understand how binary plists work, let’s take the binary above and carve it into its elements.

00000000: 62 70 6C 69 73 74 |bplist |

The first 6 bytes are "bplist" and provide the magic signature that identifies this as a binary plist file.

00000006: 30 30 |00 |

The next two bytes provide us a version number so we know which format this plist will follow. So far, this is always "00" but could change in the future, someday, maybe.

Now we start reading the objects.

00000008: D6 01 02 03 04 05 06 07 – 08 09 0A 0B 0C | |

The first object's first byte is xD6, which tells us this is a dict object with 6 elements. If we consider a dict a special type of array that consists of key/value pairs, this means that it will contain 12 objects for the 6 keys and 6 values. The next 12 bytes provide object reference numbers to those 12 objects so programs reading this can refer to the objects by number. All plists will have a top level object that is a dict.

The objects in a dict are key/value pairs with all the keys listed first and values last, in order, respectively. So, objects x01 and x07 go together, and x02/x08, and x03/x09, and so on.

Objects in an array are similarly structured, using a xAn marker where n is the number of elements. Arrays only contain values, though, thus have half the number of elements as a dict and do not have an offset table (more on that below). The code also references a set type using a xBn marker that is structured exactly like an array, but no other documentation or plist editors I've looked at include a set as a data type.

Object references are global throughout the file, so an array or child dict within the main, top-level dict will not restart their numbering at x01. If the array/dict is in the middle of the top-level dict, then the top-level will skip numbering to account for it. For example, let's sidetrack over to this example:

Part of the reason for the unique object references is to provide a way to prevent repeating the same objects over again. Any time there are multiple objects that are identical, the object itself will only be written one and each object reference list will refer to it rather than repeating it over again.

These six are all the same type (string), so I'll only describe them once.

A string object's first byte is x5n where 5 means ASCII string and n tells us how many characters to read. With ASCII, each character is one byte, so this is fairly straight forward. There is also a Unicode string type that uses a x6n marker; remember to read n x 2 bytes to get the Uint16_t for each character.

Incidentally, that was six objects, so now I expect the next 6 objects to the be the values that go with the above named keys in the order listed.

This is still a x5n marker telling us it is also an ASCII string, but it is a little different. The F in x5F tells us this string is longer than 15 characters, thus one nibble can't tell us the length. So, we read the next few bytes to get length then read from there. The big side of the first byte that tells us the length tells us how many bytes the length number is. If the hex value of that first nibble is 0 the number is 1 byte, if 1 is 2 bytes, if 3 then 4 bytes, and if 4 than it is 8 bytes. So, x1024 means we read two bytes as x0024, which is 36 characters.

Several data types (string, data, array, set, and dict, namely) use this if their data gets longer than 15 somethings.

Dates are identified with the marker x33 and the next 8 bytes are a big endian 8-byte float that denotes the number of seconds since 2001-01-01T00:00:00Z (Jan 1, 2001).

00000087: 58 66 69 6E 69 73 68 65 – 64 |Xfinished |

One last string brings us to the sixth value and the end of the dict.

00000090: 08 15 1A 27 2F 3B 40 4E – 75 76 7A 7E 87 | '/;@Nuvz~ |

This is an offset table that tells us at what offsets into the file we will find all of the objects. This is global, like the object references in the dict and array types, thus it shows location of all objects regardless of their placement in the tree.

0000009D: 00 00 00 00 00 00 01 01 | |

Six bytes of x00 padding followed by two bytes that tell us the size of the entries in the offset table (immediately above) and the object reference list (the index numbers at the start of the dict or array objects).

000000A5: 00 00 00 00 00 00 00 0D | |

This tells us the number of entries in the offset table, and thus the number of objects in the file.

000000AD: 00 00 00 00 00 00 00 00 | |

This number tells us the element number in offset table that points to the top level dict object.

000000B5: 00 00 00 00 00 00 00 90 | |

This number tells us the offset to the offset table.

For slightly faster reference, here is a table of the various markers for each of the data types.