Optimizing NTFS

In 1993, Microsoft introduced Windows NT 3.1, which brought with it a new file system designed to boost the capabilities of the new OS. Originally conceived and designed by Gary Kumura and Tom Miller, members of the original NT development team, NTFS leverages NT's security capabilities and provides enhanced capacity, efficiency, and recoverability features. These features make NTFS the file system of choice for large disk volumes and disk volumes residing on network servers. You can still find the FAT system on many NT systems' disk volumes that require its use (e.g., multi-OS partitions, the system partitions of RISC-based NT servers); however, the FAT system's limitations have made it a lame duck file system in the modern NT computing world. If you want to achieve the security, performance, and capacity requirements of most organizations, NTFS is the only option.

NTFS is a robust, self-healing file system that offers several customizable features that affect how well NTFS performs in a given environment. Some of these parameters are global and others are specific to individual NTFS volumes. You have the ability to control and tune several of these parameters. By examining your specific storage needs and then tailoring your NTFS volumes accordingly, you can realize significant increases in your systems' disk performance. This article introduces methods you can employ to assess and augment the performance of your NTFS.

In "Inside NTFS," January 1998, Mark Russinovich introduces the logical organization and internal structure of NTFS, including its core data structures. Reacquaint yourself with these concepts because they're essential to the substance of this article. To review these concepts, see NTFS Resources, page 80, and Table 1, page 80.

NTFS Performance Factors
You determine many of the factors that affect an NTFS volumes' performance. You choose important elements such as an NTFS volume's type (e.g., SCSI, or IDE), speed (e.g., the disks' rpm speed), and the number of disks the volume contains. In addition to these important components, the following factors significantly influence an NTFS volume's performance:

The cluster and allocation unit size

The location and fragmentation level of frequently accessed files, such as the Master File Table (MFT), directories, special files containing NTFS metadata, the paging file, and commonly used user data files

Whether you create the NTFS volume from scratch or convert it from an existing FAT volume

Whether the volume uses NTFS compression

Whether you disable unnecessary NTFS behaviors

Using faster disks and more drives in multidisk volumes is an obvious way to improve performance. The other performance improvement methods are more obscure and relate to the details of an NTFS volume's configuration.

Cluster Sizes: Waste Not, Want Not
All NT disk file systems, including NTFS, use the cluster as their basic unit of storage. Regardless of how small a file is, it must take up at least one cluster of disk space. Thus, very small files that are smaller than a cluster waste disk space. (Files that are less than 1KB are an exception. The system stores these files within the MFT File Record Segment—FRS—that refers to them, instead of storing them externally.) In addition, when a file doesn't end on an even cluster boundary, the file's spillover takes up another full cluster, wasting space. The larger the cluster that the file spilled over to, the more space is wasted.

NTFS's minimum cluster size wastes much less disk space than the amount of disk space FAT volumes waste. The default range of cluster sizes FAT volumes use reaches from 512 bytes to a whopping 256KB. NTFS's default cluster sizes range from 512 bytes to 4KB. FAT supports a maximum of 65,536 clusters per volume, which forces FAT to use larger cluster sizes to address large disk volumes. However, NTFS doesn't share this clusters-per-volume limitation; therefore, NTFS can use cluster sizes as small as 512 bytes or 1KB and address large disk volumes. Table 2 shows the default cluster sizes for NTFS volumes.

Choosing a cluster size. Choose a volume's cluster size based on the average type and size of file that the volume will store. Ideally, the volume cluster size is evenly divisible by the average file size (rounded to the nearest kilobyte). This ideal cluster size minimizes disk I/O transaction overhead and wasted disk space. For example, suppose you're creating a new NTFS volume that will store several files of about 6KB each in size. Format the volume with a 2KB cluster size, because the average file will fit evenly into three clusters. What if the average file size is about 16KB? In that case, a 4KB cluster size will provide the best performance, because it's evenly divisible into 16KB and requires only half the cluster allocations that the same file would require using 2KB clusters. Why not take this process one step further and use an 8KB or 16KB cluster size? These values are valid alternatives and might yield additional performance benefits; however, using cluster sizes greater than 4KB has several potentially negative side effects. For example, when you use cluster sizes larger than 4KB, disk-defragmentation utilities can't defragment the volume, you can't use NTFS file compression on the volume, and the amount of wasted disk space increases because user data files stored on the volume don't end evenly on cluster boundaries.

Determining average file size. How can you ascertain the average file size on a volume? Several methods exist to determine this value, and the right method for you depends on the number and type of files involved and the level of accuracy you need.

A mathematical average is one option, but this calculation might not give you the best picture of a volume's composition or the optimal cluster size. For example, a Web server volume that has 4KB of Web pages and 60KB of Microsoft Word document files yields a 32KB average file size. However, the optimal cluster size you determine using 32KB isn't your best cluster size choice. Handle such volumes, which contain highly disparate files, by using a smaller, least-common-denominator cluster size. I find that 4KB is usually the best cluster size in such cases. For a specialized volume containing similar-sized larger files, a larger cluster size provides superior performance (but don't forget my warning about the problems that these cluster sizes can cause with disk-defragmentation utilities and NTFS compression).

Another option is to run CHKDSK on the volume, then divide the total kilobyte disk usage by the number of files on the volume. You can also use analysis tools, such as Executive Software's Diskeeper, to find the average file size on a volume (Screen 1 shows an example of the information an analysis tool returns). A third alternative is to use Performance Monitor to track the LogicalDisk object's Avg. Disk Bytes/Transfer counters for the disk in question. This method provides you with a more accurate idea of the average file size as well as the type of data stored on that disk.

To use a cluster size larger than 4KB on an NTFS volume, you have to enable this option manually during the format process via the FORMAT command in an NT command prompt. To manually set the cluster size for a newly formatted NTFS volume, use the /A switch when formatting the drive:

FORMAT : /FS:NTFS /A:

In this command, drive is the drive you want to format, and clustersize is the cluster size you want to assign to the volume: 512, 1024, 2048, 4096, 8192, 16KB, 32KB, or 64KB. However, before you override the default cluster size for a volume, be sure to test the proposed modification via a benchmarking utility on a nonproduction machine that closely simulates the intended target.

If your NTFS partition has no common file type or size, you're safe choosing a 4KB cluster size. This value provides good performance and keeps wasted disk space to a minimum. To further maximize an NTFS volume's performance, you can place files of similar size ranges and use (e.g., read-only files) on the same volume.

MFT Breathing Room
As the core of NTFS, the MFT plays an important role in the composition of an NTFS volume and its performance. The MFT continually references files as the system locates data, reads from the data, and writes the data to the disk. Thus, the performance of the MFT is essential to the performance of the entire volume.

NTFS's developers understood the importance of the MFT's performance and took steps to ensure that the MFT maintains high performance during its lifetime. First, NT automatically places the MFT at the beginning of the disk when you first format an NTFS volume. The outer tracks of a disk yield the highest rotational and data transfer rates, and most NTFS volumes correspond to an entire physical disk; thus, the MFT's placement at the beginning of the disk minimizes the time that MFT-related disk I/O operations require. Second, NTFS's developers addressed potential MFT fragmentation by creating a special buffer zone around the MFT, which the NTFS volume reserves for use by the MFT. By default, this buffer zone uses approximately 12.5 percent of the disk. Although this allocation usually minimizes MFT fragmentation, sometimes the buffer zone isn't adequate.

MFT fragmentation. Several situations can cause the MFT to fragment. For example, when the space allocated for user data fills up, NT begins to allocate the MFT zone space to provide additional disk space for user data file storage. As a result, the MFT can fragment because noncontiguous areas are the only space left into which MFT can expand.

Another situation in which fragmentation can occur is when the MFT grows to a size greater than the default allocation of 12.5 percent of the disk. Although this MFT growth is rare, several NTFS characteristics can contribute to its occurrence. First, the MFT doesn't shrink even when you delete files and directories from the volume; instead, the MFT marks the FRSs to reflect the deletion. Second, NTFS stores very small files within the MFT FRSs that refer to the files. Although this setup provides a performance benefit for these files, it can cause the MFT to grow excessively when the volume contains many such files.

In the past, no easy solution existed to solve MFT encroachment and the subsequent fragmentation this causes, because no tools were available to manipulate or manage the MFT. However, Microsoft provides a new Registry value with NT 4.0 Service Pack 4 (SP4). You can use this value to manipulate the MFT's buffer zone size on newly created NTFS volumes. This value doesn't exist by default, so you must create it after you apply SP4. To manipulate the MFT's zone reservation, go to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem Registry key, which Screen 2, page 81, shows. Add a REG_DWORD value:

NtfsMftZoneReservation

This value's valid range is from 1 to 4 (i.e., 1 reserves 12.5 percent, 2 reserves 25 percent, 3 reserves 37.5 percent, and 4 reserves 50 percent of the NTFS volume for the MFT's buffer zone). The default value of this Registry key is 1, although you can allocate as much as 50 percent of the volume's space to the MFT zone. Screen 3 shows an analysis tool's view of an NTFS volume with a 50 percent MFT zone reservation. Although 50 percent is unnecessary in most situations, increasing this value will reduce long-term MFT fragmentation. This Registry modification is worth investigating (for more information about the MFT, see "NTFS Resources," page 80).

You must change this Registry setting prior to the creation of an NTFS volume. The modification affects only those volumes created after you create this Registry entry—the modification doesn't change existing NTFS volumes, which retain their original MFT zone reservations. Also, allocating more space for the MFT won't limit the amount of free disk space available for regular file storage, because NTFS will use the MFT zone if the normal user file area becomes full.

Don't Forget Fragmentation
The MFT is not the only component of an NTFS volume subject to fragmentation. The system requires additional head and platter movements to access a file stored in multiple noncontiguous locations on a disk. Fragmentation can result and adversely affect performance. However, when a file is contiguous, the system can read it sequentially without additional drive repositioning. Diligently maintaining a low level of file fragmentation on an NTFS volume is the most important way to improve volume performance. You can accomplish this maintenance by regularly running a disk-defragmentation utility, which makes every file on the volume contiguous. In addition, these utilities can defragment the free disk space on a volume, which is also beneficial to the volume's performance. (For a list of defragmentation tools, see "Defragmentation Resources," page 82.)

NTFS Directory Consolidation
Directories are another disk-related element that you can optimize on a regular basis to improve NTFS's performance. Similar to regular files, directory files become scattered around the disk as you create, modify, and delete directories from a volume. Directory files aren't as easy to optimize as regular files are, because NT is constantly using directory files. Thus, system performance will benefit from directory files that are as contiguous as possible. Recently, Executive Software released Diskeeper 4.0 for Windows NT, a defragmentation utility that enables a boot-time defragmentation and consolidation of NTFS directory files. However, you're not out of luck if you don't own this utility.

You can use the following procedure to defragment your NTFS directories. Because this method is less manageable than running a disk-defragmentation utility, I recommend investing in a disk utility as a long-term solution. Be sure to back up your system and Registry before using the following procedure. Because this method involves file deletion, it can, if done improperly, result in data loss. To manually defragment directories on an NTFS volume:

Copy all your files (only the files) to another partition or a tape drive.

Delete the files from the original partition.

Copy all your directories (only the directories) to another partition or a tape drive.

Delete the directories from the original partition.

If you intend to create a paging file on this partition, do so at this step so that the paging file is unfragmented. Set the paging file to the identical minimum and maximum size to prevent future fragmentation of the file.

Copy all the directories back to the original partition.

After you follow this procedure, all the volume's directories will be contiguous and consolidated near the beginning of the volume (usually following the MFT reserved zone).

When you use this procedure to optimize your directories, use two NTFS volumes and the Microsoft Windows NT Server 4.0 Resource Kit SCOPY utility to preserve NTFS security. SCOPY preserves security information during a copy operation, so you won't lose the access control lists (ACLs) assigned to the files and directories you're copying.

Preventing directory fragmentation. Whenever you install a major application on an NTFS volume, you can prevent directory fragmentation. To prevent directory fragmentation follow these three steps:

Completely defragment the volume, including free space.

Reboot the system.

Install the application.

This method minimizes the amount of directory fragmentation on the volume, because NT tends to place new directories near the beginning of the disk after you reboot the system. My unconfirmed suspicion is that this placement is the work of the directory allocation pointer, which tells NT where to write the next directory on disk. I believe the system resets this allocation pointer to the beginning of the volume after you reboot the system.

The Convolution of NTFS Conversions
Whether an NTFS volume was a fresh creation or converted from a FAT volume is an important factor that affects NTFS volumes' performance. As I stated previously, the position and relative fragmentation of the MFT on an NTFS volume have a strong impact on the volume's overall performance. When you create a fresh NTFS volume, the system places the MFT at the beginning of the volume. However, the system places the MFT on an NTFS volume that was converted from a FAT volume wherever free space exists on the volume. This space isn't usually at the beginning of the disk, and might not be a contiguous area. Aside from performing a backup, reformat, and restore, you can't defragment or relocate the MFT. Thus, converted NTFS volumes are much slower than NTFS volumes you create from scratch.

Be aware that the NTFS system partitions you create during the NT setup process are FAT volumes: When you choose to format your boot volume as NTFS during Setup, NT initially creates the volume as FAT, and only later in the Setup process converts it to an NTFS volume. As a result, your boot volume is subject to the problems of a converted NTFS volume. Unfortunately, the best option you have for optimizing the MFT of these system partitions is to religiously run disk-defragmentation software to maintain the volume's other portions. To determine the location and construction of an NTFS volume's MFT, you can use a commercial disk-defragmentation or analysis utility.

Directories and Filenames: Think Short
Creating directory trees with dozens of levels and using enormous filenames detracts from the overall performance of your NTFS volumes. Although NTFS's performance is more tolerant of directory length and filename excesses than are other file systems, such as FAT, keep directories shallow and filenames short to maintain snappy performance. The system can navigate shallow directory structures more quickly and easily, and long filenames require additional storage space and processing overhead. I recommend that you keep NTFS directory trees to fewer than 10 levels deep, and filenames and directory names to fewer than 30 characters. These limits create an efficient directory structure that lets the volume maintain a higher level of performance.

NTFS Compression
NTFS compression, which Microsoft introduced with NT 3.51, is the ability to selectively compress the contents of individual files, entire directories, or entire directory trees on an NTFS volume. In my experience, volumes that use NTFS compression deliver performance increases as high as 50 percent over their uncompressed counterparts, depending on the type of data stored on the volumes. This performance seemed too good to be true until I monitored CPU utilization during a subsequent run of the same benchmarks on a compressed NTFS volume. The CPU utilization on the test jumped from an average of 10 to 18 percent on the uncompressed NTFS volume to a whopping 30 to 80 percent on the compressed NTFS volume. In addition, performance significantly decreased when I used NTFS compression on larger volume sizes (4GB or greater) and software-based, fault-tolerant RAID volumes.

You can use NTFS compression to significantly increase disk performance on smaller volumes with files containing highly compressible data. However, doing so will cause a significant increase in CPU utilization. This effect might be tolerable on systems with extremely fast processors or multiple installed processors. You can compress NTFS volumes via the Properties dialog box of a drive's Explorer window or by using the command-line COMPACT utility. Be sure to test this feature on a nonproduction machine prior to deploying it in your environment.

Disabling Unnecessary Access Updates
Another method to enhance NTFS performance is to disable unnecessary default NTFS behaviors. By modifying the Registry, you can stop NTFS from automatically updating the last access time and date stamp on directories as NTFS traverses its B-tree directory structure. When you disable this behavior, you can reduce NTFS's operational overhead without significantly impairing functionality. In the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem Registry key, change the NtfsDisableLastAccessUpdate value of type REG_DWORD from the default value 0 (enabled) to 1 (disabled). This Registry value doesn't exist by default, so you need to enter it manually.

Plan Ahead
In this article, I've examined controllable aspects of NTFS and individual NTFS volumes. Considering the intended use and average file size of your NTFS volumes will put you in a better position to optimize them. Think about these factors before creating a volume, because you need to make many of these changes at inception, before NT stores any data on the volume. Changing a format parameter or modifying a Registry entry is much easier than backing up and restoring an entire volume.