NTFS - Glossary

Overview

This is a glossary of all terms.
Some entries refer to other entries, e.g. See also.
Some entries have an entire page of their own, e.g. More...
If your browser supports access keys, then you can jump around this
document by using, for example, Alt-M for the M section.

An Access Control Entry is the smallest unit of security.
It contains a SID (either a user or a group) and permissions information.
The permission will be one of Access Allowed, Access Denied
or System Audit. This object has flags to determine how the
permissions should be inherited.
See also:
SID,
ACL and
Auditing

This attribute is used when a file's
attributes won't fit in a single MFT File Record. It has a list
of all the attributes and where they can be found. The $ATTRIBUTE_LIST
is always stored in the Base FILE Record.
See also:
File Record$MFTBase FILE Record

Audit, Auditing

As part of the security permissions of a file,
any actions performed on the file can be recorded.
For example a file could be required to log all the people who tried
to read it, but didn't have the permissions to do so.

B+ Tree

A B+ tree is a variant of the binary tree.
Instead of one data element per node, there are many.
(In NTFS the actual number depends on the lengths of
the names and the cluster size). The B+ tree retains
the efficiency of a binary tree and also performs well
with large numbers of data elements (because the tree tends
to grow wide rather than deep).
See also:
Binary Tree and
Balanced Tree.

BAAD

During chkdsk, if NTFS finds a multi-sector item (MFT, INDEX BLOCK, etc)
where the multi-sector header doesn't match the values at the end of the
sector, it marks the item with the magic number 'BAAD', and fill it with zeroes
(except for a short at the end of each sector...)

Often binary trees can become very uneven. By reorganising
the data, the tree can be balanced such that no a node has
similar numbers of children to it's left and right.
See also:
B+ Tree and
Binary Tree.

Base FILE Record

If the attributes don't fit into a single MFT record
then the Base FILE Record holds enough information to
locate the other records.
See also:
$ATTRIBUTE_LIST,
FILE Record and
$MFT.

Binary

Maths carried out in base two. In this documentation, certain flags
fields are represented in binary, for the sake of clarity. e.g.
000010002, 0100000002.
See also:
Decimal,
Hex and
Units.

Binary Tree

This is an efficient way of storing sorted data in order.
Each node in the tree represents a data element.
The left child node is a collection of all the elements that come before it.
The right child node is a collection of all the elements that come after it.
See also:
B+ Tree and
Balanced Tree.

In Linux terminology, this is a cluster. Block device In Linux
terminology, this is a storage unit. Cluster The minimum allocation
unit. Clusters are a fixed power of 2 of the sector size (called the
cluster factor), and their size can be between 512 bytes and 4 KB
(Sometimes 64 KB, but 4 KB is the largest cluster size that the current
NTFS compression engine can operate with. That limit may be related to
the 4 KB page size used on the Intel i386 CPU). This size can be set
with the Windows NT format utility, whose default is: Volume size
Cluster size 1 to 512 MB Sector size 512 MB to 1 GB 1 KB 1 GB to 2 GB 2
KB more than 2 GB 4 KB

This is a DOS and Windows utility to check and repair filesystems.
Its name is an abbreviation of check disk.
See also:
fsck.

Cluster

This is the smallest unit of disk that NTFS uses
and it is a multiple of the sector size.
It is determined when the volume is formatted and
cannot be altered afterwards.
See also:
Sector,
$Boot and
Volume.

Compression

NTFS supports file- and directory-level compression.
The compression is performed transparently when the file
is read or written. Any new files in a compressed
directory will automatically be compressed.
See also:
Compression Unit

Compression Unit

Each file marked to be compressed is divided into sixteen
cluster blocks, known as compression units. If one of these
blocks cannot be compressed into fifteen clusters or less
it is left uncompressed. This division also helps accessing
a file randomly, ie it isn't necessary to decompress the whole
file.
See also:

Non-resident attributes are stored in intervals of clusters called runs.
Each run is represented by its starting cluster and its length.
The runs map the VCNs of a file to the LCNs of a volume.
See also:
Attribute,
Cluster,
LCN,
VCN and
Volume.

Decimal

Maths carried out in base ten. In this documentation,
numbers that are neither in hex, nor
binary, are in decimal,
e.g. 16 (sixteen), 23 (twenty-three).
See also:
Binary,
Hex and
Units.

An NTFS directory is an index attribute. NTFS uses index attributes to collate
file names. A directory entry contains the name of the file and a copy of the
file's standard information attribute (time stamp information). This approach
provides a performance boost for directory browsing because NTFS does not need
to read the files' MFT records to print directory information.

The $MFT is made up of FILE records, so named because of
a magic number of FILE. Each record has a standard
header and a list of attributes. If the attributes don't
fit into a single record, then more records will be used
and a $ATTRIBUTE_LIST attribute will be needed.
See also:
Attribute,
Attribute List,
Magic Number and
$MFT.

File Record Segment (FRS)

FRS = MFT File Record

File Reference

Each file record has a unique number identifying it.
The first 48 bits are a sequentially allocated number
which is the offset in the $MFT. The last 16 bits
are a sequence number. Every time the record is altered
this number is incremented. The sequence number can
help detect errors on the volume.
See also:
File Record,
$MFT and
Volume.

The valid format for a GUID is {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
Globally Unique Identifier (GUID)
GUID structures store globally unique identifiers (GUID). A GUID is a
128-bit value consisting of one group of eight hexadecimal digits, followed
by three groups of four hexadecimal digits each, followed by one group of
twelve hexadecimal digits. GUIDs are Microsoft's implementation of the
distributed computing environment (DCE) universally unique identifier (UUID).
Example of a GUID:
1F010768-5A73-BC91-0010A52216A7
order stored on disk?
01020304-0506-0708-090A0B0C0D0E0F010
0x00 04030201
0x04 0605
0x06 0807
0x08 090A0B0C0D0E0F010

Hex, Hexadecimal

Maths carried out in base sixteen. In this documentation,
many numbers represented in hex, e.g. 0x02E0, 0xF100.
See also:
Binary,
Decimal and
Units.

The OS/2 filesystem. Remember: once upon a time, OS/2 had to be the
operating system developed by both IBM and Microsoft. There was a break
between the 2 giants. IBM continued to develop OS/2 (it became OS/2
Warp), and that explains why OS/2 knows how to execute Windows
applications. Microsoft decided to make its own operating system:
Windows NT. HPFS design influenced NTFS design, so the 2 filesystems
share many features.

Index records are used by directories, $Quota, $Reparse
and $Secure. The contents depend on the type of index
being kept. Directories store $FILE_NAME attributes.
See also:
Directory,
$I30,
$Quota,
$Reparse and
$Secure.

Infinite Logging Area

Something contained in $LogFile. It consists of a
sequence of 4KB log records.
See also:
$LogFile

Inode

An inode is the filesystems representation of a file, directory, device, etc.
In NTFS every inode it represented by an MFT FILE record.
See also:
Directory,
File,
FILE Record and
Filesystem

$J

$J is a named data stream of the Metadata File $UsnJrnl.
See also:
$UsnJrnl

One 4KB chunk of the infinite logging area. It starts with
the magic number 'RCRD' and a fixup, then has undocumented variable
length data. [The log record might be further subdivided - I cannot
imagine they waste 4KB if they only have to log a few bytes. Custer
mentions high level and low level 'records'. High level are: - allocate
inode n, - make a directory entry foo in directory m low level are: -
modify inode n with the new contents of <1KB>]

This metadata file is used
to guarantee data integrity in case of a system failure. It has two
copies of the restart area and the infinite logging area. The log file
is near the centre of the volume, just after the second cluster of
the boot file. [Better say 'run' than cluster. The boot file usually
extends over several clusters at the beginning of the disk, and then has
a single run of just one cluster (the copy of the boot sector). Also,
isn't it 'infinite'?]
Transactional logging file

Data on the storage unit used by the filesystem only, as a
frame to access user data. Metadata constitutes the structure of the
filesystem). Metadata examples from various filesystems include FATs,
inode tables, free block lists, free block bitmaps, logging areas, and
the superblock.

meta-data
Data about data. In data processing, meta-data is definitional data
that provides information about or documentation of other data managed
within an application or environment.
For example, meta data would document data about data elements or
attributes, (name, size, data type, etc) and data about records or
data structures (length, fields, columns, etc) and data about data
(where it is located, how it is associated, ownership, etc.). Meta
data may include descriptive information about the context, quality
and condition, or characteristics of the data.

There are two mechanisms for storing permissions in NTFS.
One is a superset of DOS File Permissions, which includes
Read Only and Hidden.
The other is based on ACEs and allows granting specific
permissions to specific users.
See also:
$ACE,
File Permissions and
$SECURITY_DESCRIPTOR

POSIX

An acronym (pronounced like positive) for Portable Operating System
Interface, suggested by Richard M. Stallman. It is a set of
international standards (ISO/IEC 9945-1:1996(E), ANSI/IEEE Std 1003.1
1996 Edition) to interface with Unix-like exploitation systems, e.g.
Linux. NTFS does not support Unix-like device files.

$PROPERTY_SET

obsolete

$Q

This is one of the named indexes belonging to $Quota.
See also:
Index,
$O and
$Quota.

In MacOS's filesystem, HFS, files are allowed to have multiple
data streams. These are called resource forks.
See also:
HFS and
Stream.

Roll-back

When an NTFS volume is mounted, it is checked to see if it
is in a consistant state. If it isn't then the $LogFile is
consulted and transactions are undone until the disk returns
to a consistant state. This does not guarantee data integrity,
only disk integrity.
See also:
$LogFile,
Transaction and
Volume.

Two copies of this are in $LogFile. A restart area has the
magic number 'RSTR' followed by a fixup and some other data, including
three LSNs. A restart area has a pointer into the log area, such as the
first and last log records written and the last checkpoint record
written. (that is three - now which is which?)

This metadata file stores a table of security
descriptors used by the volume.

Security

There are two levels of security in NTFS.
There are the DOS File Permissions, such as Read Only and Hidden
and an ACL model which grants specific permissions to specific users.
See also:
ACE,
ACL,
Permissions,
$SECURITY_DESCRIPTOR and
SID.

This attribute stores all the security information about a file or
directory. It contains an ACL for auditing, an ACL for permissions and
a SID to show the user and group of the owner.
See also:
Attribute,
ACL,
ACE and
SID.

Security Identifier (SID)

This variable-length identifier uniquely identifies a user
or a group on an NT domain. It is used in the security permissions.
See also:
ACE,
ACL and
$SECURITY_DESCRIPTOR.

This is one of the named indexes belonging to $Secure.
See also:
Index,
$SDH and
$Secure.

Sparse File

NTFS supports sparse files. If a file contains large, contiguous,
blocks of zeros, then NTFS can choose to not waste any space storing
these portions on disk. They are represented as data runs containing
nothing. When read from disk, NTFS simply substitutes zeros.
See also:
Data Runs.

This attribute contains information about a file,
such as its file permissions and when it was created.

Stream

All data on NTFS is stored in streams, which can have names.
A file can have more than one data streams,
but exactly one must have no name.
The size of a file is the size of its unnamed data attribute.

$SYMBOLIC_LINK

This attribute
This attribute, like $VOLUME_VERSION
existed in NTFS v1.2, but wasn't used.
It does not longer exist in NTFS v3.0+.

NTFS stores four significant times referring to files and directories.
They are: File creation time; Last modification time; Last modification
of the MFT record; Last access time. NTFS stores dates as the number
of 100ns units since Jan 1st 1601.
Unix, stores dates as the number of seconds since Jan 1st 1970.

standardise 4 time fields name & description concept page?
refer to 4 times as:
C creation
A alter (modification)
M mft (mft changed)
R read (last access)
FIXME:
NOTE: There is conflicting information about the meaning of each of the time
fields but the meaning as defined below has been verified to be
correct by practical experimentation on Windows NT4 SP6a and is hence
assumed to be the one and only correct interpretation.
creation_time
Time file was created. Updated when a filename is changed(?).
last_data_change_time
Time the data attribute was last modified.
last_mft_change_time
Time this mft record was last modified.
last_access_time
Approximate time when the file was last accessed (obviously this is not
updated on read-only volumes). In Windows this is only updated when
accessed if some time delta has passed since the last update.

N.B. There is conflicting information about the meaning of each of the time
fields but the meaning as defined below has been verified to be
correct by practical experimentation on Windows NT4 SP6a and is hence
assumed to be the one and only correct interpretation.

A transaction on a system is a set of operations (on that
system) that constitutes a unit. This unit can't be divided. Before the
transaction, the state of the system is well defined. During the
transaction, it is undefined. After the transaction, it is well defined
again. A transaction can't be half-realized: if no operation fails, the
transaction is realized. If on the contrary an error occurs in one or
more of the operations, the transaction is not realized. A set of (even
atomic) operations is not atomic by definition. A transaction is a model
that provides a kind of atomicity to this set of operations.

This metadata file contains 128KB of capital
letters. For each character in the Unicode alphabet, there
is an entry in this file. It is used to compare and sort filenames.

Update Sequence

Several structures in NTFS have sequence numbers in them
to check for consistancy errors. They are FILE, INDX, RCRD
and RSTR records. Before the record is written to disk,
the last two bytes of each sector are copied to an array
in the header. The update sequence number is then
incremented and written to the end of each sector. If
any disk corruption occurs, this technique could detect it.

The Update Sequence Array (usa) is an array of the __u16 values which belong
to the end of each sector protected by the update sequence record in which
this array is contained. Note that the first entry is the Update Sequence
Number (usn), a cyclic counter of how many times the protected record has
been written to disk. The values 0 and -1 (ie. 0xffff) are not used. All
last __u16's of each sector have to be equal to the usn (during reading) or
are set to it (during writing). If they are not, an incomplete multi sector
transfer has occured when the data was written.
The maximum size for the update sequence array is fixed to:
maximum size = usa_ofs + (usa_count * 2) = 510 bytes
The 510 bytes comes from the fact that the last __u16 in the array has to
(obviously) finish before the last __u16 of the first 512-byte sector.
This formula can be used as a consistency check in that usa_ofs +
(usa_count * 2) has to be less than or equal to 510.

When representing the data runs of a file, the clusters
are given virtual cluster numbers. Cluster zero refers
to the first cluster of the file. The data runs map the
VCNs to LCNs so that the file can be located on the volume.
See also:
Cluster,
LCN and
Volume.

Volume

(=drive=partition) (extended, striped, mirrored (not supported))
A logical NTFS partition. It is a group of physical partitions
(see the fdisk utility, you can set up mirroring and stripping) that act
as one (somewhat like the Linux md block devices).