Concept - Data Runs

Overview

Non-resident attributes are stored in intervals of clusters called runs. Each run is
represented by its starting cluster and its length. The starting cluster of a run is
coded as an offset to the starting cluster of the previous run.

Normal, compressed and sparse files are all defined by runs.

The examples start simple, then quickly get complicated.

This is a table written in the content part of a non-resident file attribute,
which allows to have access to its stream.

Run 4:

Summary:

Therefore, Data2 is a fragmented file, of size 0x18E clusters,
with data blocks at LCNs 0x342573, 0x363758 and 0x393802.

Example 3 - Normal, Scrambled File

Data runs:

11 30 60 21 10 00 01 11 20 E0 00

11 30 60 - 21 10 00 01 - 11 20 E0 - 00 (regrouped)

Run 1:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x30

Offset = 0x60

Run 2:

Header = 0x21 - 1 byte length, 2 byte offset

Length = 0x10

Offset = 0x160 (0x100 relative to 0x60)

Run 3:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x20

Offset = 0x140 (-0x20 relative to 0x160)

Run 4:

Header = 0x00 - the end

Summary:

0x30 Clusters @ LCN 0x60

0x10 Clusters @ LCN 0x160

0x20 Clusters @ LCN 0x140

Therefore, Data3 is a badly fragmented file of size 0x60 clusters,
with data blocks at LCNs 0x60, 0x160 and 0x140. Furthermore, the
third block of data is physically located between the first and second blocks.
(The third run has a negative offset, placing it before the previous run).

Example 4 - Sparse, Unfragmented File

Data runs:

11 30 20 01 60 11 10 30 00

11 30 20 - 01 60 - 11 10 30 - 00 (regrouped)

Run 1:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x30

Offset = 0x20

Run 2:

Header = 0x01 - 1 byte length, 0 byte offset (sparse)

Length = 0x60

Offset = N/A

Run 3:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x10

Offset = 0x50 (0x30 relative to 0x20)

Run 4:

Header = 0x00 - the end

Summary:

0x30 Clusters @ LCN 0x20

0x60 Clusters (sparse)

0x10 Clusters @ LCN 0x50

Therefore, Data4 is a sparse, unfragmented file, of size 0xA0 clusters,
with data blocks at LCNs 0x20 and 0x50.

This file has a sparse part in the middle of size 0x60 clusters.
It takes up no space on disk, but it it represented by 0x60 VCNs.

Example 5 - Compressed, Unfragmented File

Data runs:

11 08 40 01 08 11 10 08 11 0C 10 01 04 00

11 08 40 - 01 08 - 11 10 08 - 11 0C 10 - 01 04 - 00 (regrouped)

Run 1:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x08

Offset = 0x40

Run 2:

Header = 0x01 - 1 byte length, 0 byte offset (sparse)

Length = 0x08

Offset = N/A

Run 3:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x10

Offset = 0x48 (0x8 relative to 0x40)

Run 4:

Header = 0x11 - 1 byte length, 1 byte offset

Length = 0x0C

Offset = 0x58 (0x10 relative to 0x48)

Run 5:

Header = 0x01 - 1 byte length, 0 byte offset (sparse)

Length = 0x04

Offset = N/A

Run 6:

Header = 0x00 - the end

Summary:

0x08 Clusters @ LCN 0x40

0x08 Clusters (sparse)

0x10 Clusters @ LCN 0x48

0x0C Clusters @ LCN 0x58

0x04 Clusters (sparse)

Therefore, Data5 is a compressed, unfragmented, file of length 0x30,
with data blocks at LCNs 0x40, 0x48 and 0x58.

The data, as stored on disk, is contiguous. The sparse runs pad out
the compression units to blocks of 16 clusters (0x10).

Example 6 - Compressed, Sparse, Fragmented File

brain damaged file

Layout

The runlist is a sequence of elements: each element stores an offset to the starting
LCN of the previous element and the length in clusters of a run.

To save space, Offset and Length are variable size fields (probably up to 8
bytes), and an element is written in this crunched format:

Offset in nibble to the beginning of the element

Size

Description

0

1

F=Size of the Offset field

1

1

L=Size of the Length field

2

2*L

Length of the run

2+2*L

2*F

Offset to the starting LCN of the previous element

Offset to the starting LCN of the previous element

This is a signed value. For the first element, consider the offset as relative
to the LCN 0, the beginning of the volume.

The layout of the runlist must take account of the data compression: the set of
VCNs containing the stream of a compressed file attribute is divided in compression
units (also called chunks) of 16 clusters: VCNs 0 to 15 constitutes the 1st
compression unit, VCNs 16 to 31 the 2nd one, and so on... For each compression
unit,

The alpha stage of compression is very simple and is independent of the
compression engine used to compress the file attribute: if all the 16 clusters of a
compression unit are full of zeroes, this compression unit is called a sparse unit
and is not physically stored. Instead, an element with no Offset field (F=0, the
Offset is assumed to be 0 too) and a Length of 16 clusters is put in the
runlist.

Else, the beta stage of compression is done by the compression engine used to
compress the file attribute: if the compression of the unit is possible, N (<
16) clusters are physically stored, and an element with a Length of N is put in the
runlist, followed by another element with no Offset field (F=0, the Offset is
assumed to be 0 too) and a Length of 16 - N.

Else, the unit is not compressed, 16 clusters are physically stored, and an
element with a Length of 16 is put in the runlist.

In practice, this is a bit more complicated because some of the element can be
gathered. But let's take an ...

data runs

Length and starting
cluster are variable size fields. The first byte of a run indicates the size of both.
The size of the offset is stored in the high nibble, and the size of the length in
the low nibble.

For compressed or sparse runs, the offset is 0, and the size of the offset is also
0. Each compression unit starts at a multiple of 16 clusters. If compression is
possible, at the VCN of a unit will be one or more data runs followed by an empty
run. If there are data runs for more than 16 clusters, the unit was not compressible.
If there is no data run at all (only a large empty run), the unit Consists of All
zeroes.

Take a file of size 0x80 clusters (anywhere on disk).
This is represented by VCN (virtual cluster numbers) 0x00 to 0x7F.
These VCNs are mapper to LCN (logical cluster numbers) in runs (or extents),
eg 21 80 30 60 00.

These runs are variable length, terminated with a zero.
The low nibble of the first byte determines the length of the next number (1 byte)
namely 80.
The high nibble determines the length of the following number (2 bytes) namely 6030.

Outcome: 80 clusters, starting at cluster 6030.

The "sizes" are stored in one byte. The length is unsigned.
The offset is signed and relative to the previous offset.

The 6 extra clusters aren't actually taking up any disk space.
The VCNs are bunched into 16s. {{ If a block cannot be compressed,
it would be represented by:

21 10 10 F6 16 clusters of compressed data at F610

FIXME:
In fact, life is more complicated because adjacent entries of the same type
can be coalesced. This means that one has to keep track of the number of
clusters handled and work on a basis of X clusters at a time being one
block. An example: if length L > X this means that this particular run list
entry contains a block of length X and part of one or more blocks of length
L - X. Another example: if length L > X, this does not necessarily mean that
the block is compressed as it might be that the lcn changes inside the block
and hence the following run list entry describes the continuation of the
potentially compressed block. The block would be compressed if the
following run list entry describes at least X - L sparse clusters, thus
making up the compression block length as described in point 3 above. (Of
course, there can be several run list entries with small lengths so that the
sparse entry does not follow the first data containing entry with
length < X.)
NOTE: At the end of the compressed attribute value, there most likely is not
just the right amount of data to make up a compression block, thus this data
is not even attempted to be compressed. It is just stored as is.