NAME

outlook.pst - format of MS Outlook .pst file

SYNOPSIS

outlook.pst

OVERVIEW

Low level or primitive items in a .pst file are identified by an I_ID
value. Higher level or composite items in a .pst file are identified by
a D_ID value. There are two separate b-trees indexed by these I_ID and
D_ID values. Starting with Outlook 2003, the file format changed from
one with 32 bit pointers, to one with 64 bit pointers. We describe both
formats here.

32BITASSOCIATEDTREEITEM0X0002

A D_ID value may point to an entry in the index2 tree with a non-zero
TREE-I_ID which points to this descriptor block via the index1 tree. It
maps local ID2 values (referenced in the main data for the original
D_ID item) to I_ID values. This descriptor block contains triples of
(ID2, I_ID, CHILD-I_ID) where the local ID2 data can be found via I_ID,
and CHILD-I_ID is either zero or it points to another Associated Tree
Item via the index1 tree.
In the above 32 bit leaf node, we have a tuple of (0x61, 0x02a82c,
0x02a836, 0) 0x02a836 is the I_ID of the associated tree, and we can
lookup that I_ID value in the index1 b-tree to find the (offset,size)
of the data in the .pst file.
0000 02 00 01 00 9f 81 00 00 30 a8 02 00 00 00 00 00
0000 signature [2 bytes] 0x0002 constant
0002 count [2 bytes] 0x0001 in this case
repeating
0004 id2 [4 bytes] 0x00819f in this case
0008 i_id [4 bytes] 0x02a830 in this case
000c child-i_id [4 bytes] 0 in this case

ASSOCIATEDDESCRIPTORITEM0X7CEC

This style of descriptor block is similar to the 0xbcec format. This
descriptor is also eventually decoded to a list of MAPI elements.
0000 7a 01 ec 7c 40 00 00 00 00 00 00 00 b5 04 02 00
0010 60 00 00 00 7c 18 60 00 60 00 62 00 65 00 20 00
0020 00 00 80 00 00 00 00 00 00 00 03 00 20 0e 0c 00
0030 04 03 1e 00 01 30 2c 00 04 0b 1e 00 03 37 28 00
0040 04 0a 1e 00 04 37 14 00 04 05 03 00 05 37 10 00
0050 04 04 1e 00 07 37 24 00 04 09 1e 00 08 37 20 00
0060 04 08 02 01 0a 37 18 00 04 06 03 00 0b 37 08 00
0070 04 02 1e 00 0d 37 1c 00 04 07 1e 00 0e 37 40 00
0080 04 10 02 01 0f 37 30 00 04 0c 1e 00 11 37 34 00
0090 04 0d 1e 00 12 37 3c 00 04 0f 1e 00 13 37 38 00
00A0 04 0e 03 00 f2 67 00 00 04 00 03 00 f3 67 04 00
00B0 04 01 03 00 09 69 44 00 04 11 03 00 fa 7f 5c 00
00C0 04 15 40 00 fb 7f 4c 00 08 13 40 00 fc 7f 54 00
00D0 08 14 03 00 fd 7f 48 00 04 12 0b 00 fe 7f 60 00
00E0 01 16 0b 00 ff 7f 61 00 01 17 45 82 00 00 00 00
00F0 45 82 00 00 78 3c 00 00 ff ff ff ff 49 1e 00 00
0100 06 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 00
0110 00 00 00 00 00 00 00 00 00 00 00 00 c0 00 00 00
0120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0130 00 00 00 00 00 00 00 00 00 00 00 00 00 40 dd a3
0140 57 45 b3 0c 00 40 dd a3 57 45 b3 0c 02 00 00 00
0150 00 00 fa 10 3e 2a 86 48 86 f7 14 03 0a 03 02 01
0160 4a 2e 20 44 61 76 69 64 20 4b 61 72 61 6d 27 73
0170 20 42 69 72 74 68 64 61 79 00 06 00 00 00 0c 00
0180 14 00 ea 00 f0 00 55 01 60 01 79 01
0000 indexOffset [2 bytes] 0x017a in this case
0002 signature [2 bytes] 0x7cec constant
0004 7coffset [4 bytes] 0x0040 index reference
Note the signature of 0x7cec. There are other descriptor block formats
with other signatures. Note the indexOffset of 0x017a - starting at
that position in the descriptor block, we have an array of two byte
integers. The first integer (0x0006) is a (count-1) of the number of
overlapping pairs following the count. The first pair is (0, 0xc), the
next pair is (0xc, 0x14) and the last (7th) pair is (0x160, 0x179).
These pairs are (start,end+1) offsets of items in this block. So we
have count+2 integers following the count value.
Note the 7coffset of 0x0040, which is an index reference. In this case,
it is an internal reference pointer, which needs to be right shifted by
4 bits to become 0x0004, which is then a byte offset to be added to the
above indexOffset plus two (to skip the count), so it points to the
(0x14, 0xea) pair. We have the offset and size of the "7c" block
located at offset 0x14 with a size of 214 bytes in this case. The "7c"
block starts with a header with the following format:
0000 signature [1 bytes] 0x7c constant
0001 itemCount [1 bytes] 0x18 in this case
0002 unknown [2 bytes] 0x0060 in this case
0004 unknown [2 bytes] 0x0060 in this case
0006 unknown [2 bytes] 0x0062 in this case
0008 recordSize [2 bytes] 0x0065 in this case
000a b5Offset [4 bytes] 0x0020 index reference
000e index2Offset [4 bytes] 0x0080 index reference
0012 unknown [2 bytes] 0x0000 in this case
0014 unknown [2 bytes] 0x0000 in this case
Note the b5Offset of 0x0020, which is an index reference. In this case,
it is an internal reference pointer, which needs to be right shifted by
4 bits to become 0x0002, which is then a byte offset to be added to the
above indexOffset plus two (to skip the count), so it points to the
(0xc, 0x14) pair. Finally, we have the offset and size of the "b5"
block located at offset 0xc with a size of 8 bytes in this descriptor
block. The "b5" block has the following format:
0000 signature [2 bytes] 0x04b5 constant
0002 datasize [2 bytes] 0x0002 +4 for 6 byte entries in this case
0004 descoffset [4 bytes] 0x0060 index reference
Note the descoffset of 0x0060, which again is an index reference. In
this case, it is an internal pointer reference, which needs to be right
shifted by 4 bits to become 0x0006, which is then a byte offset to be
added to the above indexOffset plus two (to skip the count), so it
points to the (0xea, 0xf0) pair. The datasize (2) plus the b5 code (04)
gives the size of the entries, in this case 6 bytes. We now have the
offset 0xea of an unused block of data in an unknown format, composed
of 6 byte entries. That gives us (0xf0 - 0xea)/6 = 1, so we have a
recordCount of one.
We have seen cases where the descoffset in the b5 block is zero, and
the index2Offset in the 7c block is zero. This has been seen for
objects that seem to be attachments on messages that have been read.
Before the message was read, it did not have any attachments.
Note the index2Offset above of 0x0080, which again is an index
reference. In this case, it is an internal pointer reference, which
needs to be right shifted by 4 bits to become 0x0008, which is then a
byte offset to be added to the above indexOffset plus two (to skip the
count), so it points to the (0xf0, 0x155) pair. This is an array of
tables of four byte integers. We will call these the IND2 tables. The
size of each of these tables is specified by the recordSize field of
the "7c" header. The number of these tables is the above recordCount
value derived from the "b5" block.
Now the remaining data in the "7c" block after the header starts at
offset 0x2a. There should be itemCount 8 byte items here, with the
following format:
0000 referenceType [2 bytes]
0002 itemType [2 bytes]
0004 ind2Offset [2 bytes]
0006 size [1 byte]
0007 unknown [1 byte]
The ind2Offset is a byte offset into the current IND2 table of some
value. If that is a four byte integer value, then once we fetch that,
we have the same triple (item type, reference type, value) as we find
in the 0xbcec style descriptor blocks. If not, then this value is used
directly. These 8 byte descriptors are processed recordCount times,
each time using the next IND2 table. The item and reference types are
as described above for the 0xbcec format descriptor block.

32BITASSOCIATEDDESCRIPTORITEM0X0101

This descriptor block contains a list of I_ID values. It is used when
an I_ID (that would normally point to a type 0x7cec or 0xbcec
descriptor block) contains more data than can fit in any single
descriptor of those types. In this case, it points to a type 0x0101
block, which contains a list of I_ID values that themselves point to
the actual descriptor blocks. The total length value in the 0x0101
header is the sum of the lengths of the blocks pointed to by the list
of I_ID values. The result is an array of subblocks, that may contain
index references where the high order 16 bits specify which descriptor
subblock to use. Only the first descriptor subblock contains the
signature (0xbcec or 0x7cec).
0000 01 01 02 00 26 28 00 00 18 77 0c 00 b8 04 00 00
0000 signature [2 bytes] 0x0101 constant
0002 count [2 bytes] 0x0002 in this case
0004 total length [4 bytes] 0x002826 in this case
repeating
0008 i_id [4 bytes] 0x0c7718 in this case
000c i_id [4 bytes] 0x0004b8 in this case