Inside the Windows 2000 Kernel

Microsoft has declared that Windows 2000 (Win2K) is the most important upgrade in the company's history. Microsoft released Windows NT 4.0 in mid-1996, so the company has 3 years of user experience and its own study of the OS to build the Win2K update on. I want to dispel the myth that the Win2K kernel is a rewrite of the NT 4.0 kernel. The Win2K kernel is a tuned and tweaked version of the NT 4.0 kernel, with some significant enhancements in particular areas and a couple of new subsystems. Microsoft intends the Win2K kernel changes to improve the scalability, reliability, and security of the OS over NT 4.0, and to support new features such as Plug and Play (PnP) and power management. In this article, I take you on a quick tour of the kernel changes Microsoft has introduced in Win2K. I won't cover user-mode Win2K components such as Active Directory (AD) or administrative interfaces, which don't rely on kernel changes for their implementation. Space limitations prevent me from describing any particular feature in detail—look for future NT Internals columns to do that.

Scalability Enhancements
NT 4.0 has a reputation for not scaling well on SMP machines, particularly on machines with more than four CPUs. This limitation means that, for enterprise server applications such as database, Web, or email servers, you see diminishing returns on performance as you add CPUs to a system. Benchmarks that this magazine and other industry groups have performed show that the performance drop-off is dramatic when CPUs increase from four to eight. Another aspect of scalability is that applications must perform well with large data sets—large databases, for example. Before the middle and high-end range of enterprise installations will accept any OS, the OS must effectively take advantage of multiprocessors and memory. For this reason, better NT scalability has been a top Microsoft concern for some time. Scalability is a complex equation because it depends not only on an OS's scalability but also on that of the OS's applications. An OS might scale perfectly, but if it doesn't provide appropriate interfaces to its applications, the applications might not be able to scale.

Win2K addresses scalability in several ways. First, Win2K gives to memory-intensive applications the means to use larger amounts of virtual and physical memory than was possible in NT 4.0. How does more memory help an application's scalability? Most server applications must quickly process large amounts of data to perform well. For example, for a database server to scale, the server must handle large databases. Because accessing disks is slow compared with accessing main memory, a server performs best when the data it must access for a database query is located, or cached, in physical memory. NT (and Win2K) is a 32-bit OS that divides its 4GB of virtual memory that is addressable with 32 bits into a lower half, in which NT assigns to applications, and an upper half, where OS and device-driver code and data reside. Thus, NT 4.0 effectively limits an application to managing at most 2GB of data (3GB with the /3GB boot.ini switch on NT Server, Enterprise Edition—NTS/E). A Win2K enhancement, Address Windowing Extensions (AWE—some Microsoft marketing literature refers to AWE as Advanced Windowing Extensions), lets an application manage much more data.

AWE consists of four APIs that applications use to allocate and deallocate physical memory and to obtain references, or windows, in their address space to portions of physical memory. For example, on a system with 4GB of physical memory, a database application might allocate the majority of the memory for its cache. The application then creates windows to the portions of the cache that it must access as it processes database queries. When a query completes, the application closes the windows it created. Figure 1, page 46, shows an example of an application that has allocated physical memory and defined a window to a portion of the physical memory.

Intel recently introduced x86 processors and motherboards that support Physical Address Extension (PAE), a mechanism that lets OSs and applications access more than 4GB of physical memory, even though the processors still use 32-bit virtual addressing. PAE uses 36-bit physical addressing to support up to 64GB of physical memory, so an application using the AWE APIs can create data caches that are close to 16 times as large as the cache sizes possible in NT 4.0. At press time, Microsoft plans to make the AWE APIs available for all versions of Win2K (i.e., Win2K Professional—Win2K Pro, Win2K Advanced Server—Win2K AS, and Win2K Datacenter Server—Datacenter). However, Win2K Pro and Win2K Server will support only as much as 4GB of physical memory. Win2K AS will support as much as 8GB of physical memory, and Datacenter will support as much as 64GB of physical memory.

Other Win2K scalability enhancements address multiprocessor performance. The Job object, a new kernel object, comprises one or more processes that an application or administrator specifies. A Job is a process container with characteristics that Job-object APIs can manipulate. An administrative program can use the APIs to limit the amount of CPU time that the Job can consume before termination, to assign the Job's processes to particular CPUs in an SMP machine, or to control the Job's processes' scheduling priority. Microsoft developed the Job object with batch processing in mind. In batch processing, relatively long-running processes perform certain calculations or data processing. Data mining is one example of a computation that might be well suited to Job objects. A Job object doesn't necessarily enhance the scalability of the computation it encompasses, but the object can enhance the performance of the rest of the system. Because a Job object can assign CPU time, scheduling, working-set sizes, commit limits, and other limitations to the processes the object contains, Job objects can minimize the effect that the computations have on more important or time-critical applications running on a system.

Another scalability enhancement in Win2K changes the length of time that the OS lets threads execute on a CPU before the scheduler might schedule a different thread. Microsoft calls these time lengths quantums, and in NT 4.0 the lengths are shorter on NT Workstation than on NT Server. Shorter quantums are appropriate for systems running multiple interactive applications; longer quantums are best for systems that want to promote the performance of one or two noninteractive applications. On Win2K, systems administrators can configure short or long quantums regardless of whether they run Win2K Pro, Win2K AS, or Datacenter. This flexibility lets administrators decide which lengths are best for the application workloads they run.

Microsoft has also heavily tuned the Win2K kernel for SMP performance. OSs must use spinlocks to ensure that only one CPU at a time accesses key data structures. For example, Win2K uses the scheduler database to keep track of which threads are eligible to execute; if two CPUs modify the database simultaneously, the database could become corrupt. The Win2K kernel uses about 10 locks to protect global data structures such as the scheduler database. The locks that Win2K uses are advanced locks called queued spinlocks. Queued spinlocks have characteristics that make them perform better on SMPs than the standard spinlocks that NT 4.0 uses perform, particularly when you add CPUs to a system.

To increase Win2K's scalability, Microsoft has significantly raised several system components' limits. For example, nonpaged pool (i.e., the maximum amount of available locked kernel memory) doubles in size—from 128MB to 256MB. Paged pool (i.e., pageable kernel memory) also increases—from a maximum of 192MB to almost 470MB. The maximum amount of physical memory that the kernel can map on behalf of device drivers increases from 192MB to approximately 640MB. In addition, the maximum file system cache's virtual size increases from 512MB to almost 1GB. Unfortunately, the Intel chip's virtual address space limit won't let Microsoft apply all those maximum component sizes on one system. However, Win2K's more numerous kernel memory pools let the OS run larger workloads and data sets than NT 4.0 can run, and the Win2K Cache Manager's larger virtual cache improves the Cache Manager's performance in managing cached files.

Finally, Microsoft has tweaked specific kernel subsystems in Win2K. For example, the Memory Manager provides better application performance on SMP machines than it did in NT 4.0.

Security Enhancements
Microsoft has enhanced the NT 4.0 security subsystem for Win2K. The basic security model is unchanged from NT, but some new features make managing security easier for administrators and application programmers. The first new feature is inheritable security. In NT 4.0, an object, such as a file or Registry key, inherits security settings from the container (e.g., a directory or parent key) in which the system creates the object at the time of creation. Subsequent modifications to the container's security settings don't affect the object's settings. This restriction means that you must perform either an administrative or programming manual operation to effect mass updates. Inheritable security lets an administrator or programmer designate specific security settings as inheritable settings; that is, when you apply security settings to a container, all objects within the container adopt the settings.

Another enhancement to Win2K's security subsystem is the addition of object-specific security settings. You apply these settings to AD objects, and a developer can use the settings to precisely control security for property sets and property sheets, which are subsets of AD objects. Globally Unique IDs (GUIDs) identify the subsets, and the object-specific settings specify the GUIDs that apply to the subsets.

The Win2K and NT 4.0 security subsystems use Access Tokens, objects that the subsystems call on to identify users that are logged on to computers. With the Job object comes a new type of Access Token—a Restricted Token. Most tasks that run as Jobs are noninteractive; therefore, running Jobs in restricted security environments where the applications the Jobs run can't perform operations that will adversely affect the rest of the system is desirable. For example, a Job shouldn't be able to reboot a computer or to access certain files or Registry keys. Because a Job must run in the context of a particular user who might be able to reboot a computer or access files or Registry keys, Win2K lets applications designate a Restricted Token, which is a copy of the user's token minus certain privileges.

Microsoft has focused the final Win2K security enhancement on Win2K interoperability with other OSs, such as UNIX or Novell NetWare. The Win2K security model differs substantially from the security models of these other OSs, especially in the way Win2K encodes possible object access types. For example, Win2K has both general access types (e.g., Read, Write) and access types that are specific to particular objects (e.g., List Directory, Create Directory Entry). These differences between OSs prevent Win2K applications from directly manipulating the security of objects that other OSs define. To solve this problem, Win2K introduces provider-independent access rights. A security provider that Microsoft includes in Win2K for interoperability with a particular OS translates provider-independent access rights to the access rights that the other OS understands, letting Win2K applications control security on objects that the other OS creates. So that applications can use provider-independent access rights universally, Win2K supplies a security provider that translates provider-independent access rights to Win2K access rights.

Power Management
Anyone who has used Windows 9x on a laptop knows that power management confers advantages. Power management helps an OS extend a laptop's battery life by reducing the power consumption of devices you don't actively use. Power management also lets you put the entire system into a standby mode; you can later resume working exactly where you left off. Power management requires OS and device-driver support, and with the exception of some drivers that certain laptop vendors specially code, NT 4.0 doesn't have this OS or device-driver support. Win2K introduces power management to NT as part of Microsoft's OnNow initiative.

Win2K implements power management with the Power Manager, a new kernel-mode subsystem. The Power Manager requires the system to have a motherboard and BIOS that implements the Advanced Configuration and Power Interface (ACPI) standard. ACPI defines four device-related power states and six system-related power states, which Figure 2 illustrates, that range from fully on to fully off. The four device-related power states are D0, D1, D2, and D3. D0 always means on and D3 always means off. Individual devices need to decide what states D1 (almost on) and D2 (almost off) mean for them (e.g., if a device doesn't have power modes other than on or off, D1 would mean on, and D2 would mean off).

The Power Manager moves the system power state through various levels according to the power-management setting that an administrator specifies. For example, if you specify that you want your laptop to shut down by saving the contents of memory to disk so that you can restart later where you left off—and you don't want the laptop to consume any battery power in the interim—the Power Manager moves all the laptop's devices to the D3 state, then moves the system power state to the Hibernate level.

Changing devices' power state requires device-driver support. In Win2K, device drivers handle Power Manager requests that query the device's ability to change power level, as well as requests that instruct the device to change state. One power-management requirement in Win2K is that all of a system's device drivers must be responsive to the Power Manager's requests. If just one legacy NT 4.0 device driver is installed, the Power Manager won't change the system power state from the Working mode.

Plug and Play
Another desirable feature in Win9x is the OS's ability to automatically detect a new hardware device and install the appropriate device driver. Microsoft has kept this functionality in Win2K. The Win2K PnP Manager can identify hardware conforming to the ACPI standard that is located on a system's I/O buses; if a device driver isn't installed for the detected device, the PnP Manager initiates the driver's installation procedure. To make this capability possible in Win2K, Microsoft made significant changes to the way NT 4.0 implements device drivers.

In NT 4.0, a device driver must enumerate buses to search for hardware that the driver works for. In Win2K, the PnP Manager enumerates the buses to locate devices and inform drivers of the devices' presence. Two numbers identify a device on a bus: a vendor ID (VID) and a device ID (DID). The combination of these ID numbers uniquely identifies a device. Upon locating a combined VID and DID, the PnP Manager checks the HKEY_LOCAL_MACHINE\SYSTEM\ CurrentControlSet\Enum subkey that corresponds to the bus on which the device is located (e.g., the PCI bus). The PnP Manager searches the subkey for a key that connects to the detected VID and DID. If a driver is already installed for the device, a Registry value for the device's key will reference another Registry key in HKEY_LOCAL_MACHINE\SYSTEM\ CurrentControlSet\Services\Class that contains information about the device driver file. If a driver isn't yet installed for the device, the PnP Manager notifies the user-mode PnP subsystem that the subsystem needs to locate the appropriate driver's installation script (.inf file) and initiate the driver's installation.

Another advantage of PnP is that it lets Win2K tell drivers to reconfigure their hardware. Some buses, such as Personal Computer Memory Card International Association (PCMCIA), generate OS notifications when you add or install devices. For example, suppose a user inserts a new PC card that requires the use of hardware resources, such as interrupts, that another device is using. The Arbiter, a Win2K PnP component, reorganizes device-resource assignments to accommodate the new device. The PnP Manager informs the other drivers of the changes, and the drivers reset their devices accordingly.

The Windows Driver Model
The final significant change that Microsoft has made to the NT 4.0 kernel in Win2K is adding support for the Windows Driver Model. WDM is a convention that Microsoft has adopted for separating a device class' general functionality from a particular device's specific functionality. For example, Human Interface Devices (HIDs) such as keyboards and mouse devices share common characteristics but differ from one another in control and configuration details. Thus, one WDM class is the HID class, for which Microsoft provides a HID-class driver that serves as the high-level interface to all types of HID devices. Hardware vendors implement proprietary HID minidrivers that interface the HID-class driver to their devices and support the device's particular functionality. Table 1 lists the device classes that WDM supports.

WDM makes life easier on hardware vendors in another way, albeit in a way that will eventually become unimportant: Properly written WDM drivers easily port from Win2K to Win98, and vice versa. Microsoft has already made this capability possible with network adapter drivers and mass storage device drivers, but WDM extends the convenience to many more types of devices.

Reliability Enhancements
Win2K debuts several features that prevent, avoid, and resolve system crashes. In NT 4.0, device drivers can modify, or write to, any part of kernel-mode memory. Device drivers and the NT kernel reside in kernel-mode memory, which creates the possibility that an errant driver can corrupt another driver or the OS. With the aid of a processor's memory management unit (MMU), Win2K marks as write-protected the drivers' and OS image's code portions. If an errant driver attempts to modify these portions of kernel memory, the Win2K Memory Manager immediately detects the violation, and an administrator or developer can then easily identify the faulty driver.

A new Win2K development tool called the Driver Verifier isolates many more types of device-driver errors. When the system applies the Verifier to a device driver under suspicion of misbehaving, the Verifier closely monitors the device driver's use of kernel memory buffers and the driver's interactions with the Win2K kernel. The Verifier relies on Win2K kernel support to immediately detect common violations of device-driver programming rules. Thus, an administrator can precisely and immediately identify drivers that cause system instability, rather than having to work with possibly misleading clues when a crash occurs at a later point.

To help resolve system crashes, Win2K offers the Safe Boot and Repair Console options. Safe Boot is a boot option that Win2K presents to a user as the OS prepares to load. Safe Boot lets users specify that Win2K load a minimal subset of device drivers and services, rather than loading all installed device drivers and services. When a third-party device driver habitually prevents successful startups because it crashes Win2K, you can choose Safe Boot to tell Win2K to avoid loading the driver. Two basic safe-boot types exist: minimal and network-enabled. The Registry key HKEY_LOCAL_MACHINE\SYSTEM\ CurrentControlSet\SafeBoot has the Minimal and Network subkeys, which list the device drivers and services that are part of each configuration.

The Repair Console is a recovery option that you can fall back on if Safe Boot fails to get a system running. The Win2K setup CD-ROM installation procedure gives you the option to boot from the CD-ROM to a minimal command prompt from which you can access a fault installation. You use commands at the prompt to enable and disable device drivers, and to copy, delete, and refresh system and device-driver files.

NTFS Version 5
The NTFS file system's version 5 implementation, which ships with Win2K, has undergone major enhancements. A long-standing user complaint about NTFS has been that NTFS won't let users create symbolic links—files that redirect path processing to another file. For example, if \temp\link is a symbolic link to \myfiles, then the path \temp\link\ mark.txt would resolve to the file \my files\mark.txt. NTFS in Win2K implements reparse points to address symbolic-link creation. Reparse points are files that contain a tag and as much as 16KB of application-defined data. The reparse tag identifies the device driver that interprets the application-defined reparse data associated with the reparse point. Thus, when NTFS processes a path and encounters a reparse point, NTFS hands the reparse data to the device driver that the reparse tag references. The referenced driver can return a different path to NTFS to process, or it can perform other processing specific to the reparse-point type.

Symbolic links are specific types of reparse points called junction points. Figure 3 shows an example of a junction point. The NTFS drive manages junction points. Other reparse points you might use as you deploy Win2K are associated with Hierarchical Storage Management (HSM). An HSM driver can designate reparse points to identify files and directories that the system migrates to offline storage. When NTFS processes a path with such an HSM-associated reparse point, NTFS notifies the HSM driver and can bring the data back to main storage.

In the NT 4.0 market, numerous third-party disk quota-management tools fill the need systems administrators have to control the amount of disk space users consume. Win2K brings quota support directly to NTFS; in NTFS, administrative tools can define quota thresholds for specific user accounts or as global defaults for all users. Every NTFS file has an Ownership security attribute that, with a SID, identifies the user account with which the system associates the file. So that it can invoke appropriate actions when users reach data limits, NTFS keeps track on a per-disk basis of the total amount of data that the system associates with each user. Administrators can define two user limits: a warning threshold and a limit. When users reach the warning threshold, Win2K informs them that they need to delete files to stay within their quota. When users reach their limit, the system prevents them from allocating more disk space.

Many applications that manage on-disk databases or caches often allocate sparse files. Sparse files are files that might contain a large number of undefined spaces—spaces that the application might never initialize. For example, an application might create a 2GB database file upon installation, then fill the database file as users add records to the database. The database application might not store the records at the beginning of the file but rather where the application determines is most efficient with respect to its storage algorithms. In NT 4.0, whenever an application allocates a file, whether the file is sparse or not, NTFS allocates on-disk storage to represent the file and fills the space with zeros. A Win2K optimization permits applications to designate a file as a sparse file; NTFS then allocates on-disk space only for the portions of the file that the application defines. This enhancement results in disk-space savings and improves application performance.

Desktop and Start menu shortcut links are convenient NT user interface (UI) features. However, if you move the file that a shortcut refers to, you break the shortcut. Then, you must manually reconnect the shortcut to the link target. NTFS in Win2K has built-in link tracking, a facility that lets NTFS track the movements of link targets. When a link target moves to another NTFS volume within the same domain, NTFS can transparently update the link to point at the file's new location. Link tracking applies to desktop shortcuts and OLE links.

The final Win2K NTFS enhancement is Encrypting File System. EFS is an add-on device driver that is tightly connected to NTFS. EFS and NTFS together provide transparent file encryption and decryption facilities for user files. A user marks a file or directory as encrypted, and EFS and NTFS generate a file encryption key (FEK) for the encrypted file. EFS uses the FEK and a stronger variant of the Data Encryption Standard (DES) algorithm—DESX—to encrypt the file's data. Then EFS uses RSA public-key-based encryption to encrypt the FEK with the user's automatically assigned EFS encryption key and stores the encrypted FEK with the file. When a user accesses an encrypted file, EFS uses the user's key to decrypt the file's FEK, then uses the FEK to decrypt the file's data. Although third-party utilities provide encryption facilities for NT 4.0, EFS has the advantages of being totally transparent and supported by Win2K's administrative, backup and restore, and data-recovery interfaces.

Other File-System Enhancements
Win2K includes file-system enhancements that aren't related to NTFS. First, Win2K fully supports the FAT32 file-system format. NT 4.0 can't interpret FAT32 drives without a third-party add-on's help, and under no circumstances can NT 4.0 boot from a FAT32-formatted drive. Because FAT32 handles space more efficiently than FAT16 does and can also handle larger disk sizes than FAT16 can, FAT32 is a better file system format for installations that don't require NTFS's reliability or security features. Many Win 9x installations use FAT32 for the advantages it confers over FAT16, so Win2K's support for FAT32 makes it possible to share data on FAT32 drives between the OSs in dual-boot environments. Instead of adding a new device driver to implement FAT32 in Win2K, Microsoft simply extended the FAT12/ FAT16 driver, \%systemroot%\ system32\drivers\fastfat.sys, to understand FAT32.

NT 4.0 assumes the CD-ROM File System (CDFS: International Organization for Standardization—ISO—9660) as the format for read-only media, including CD-ROMs. The Universal Data Format (UDF) file system is a cross-platform standard (ISO 13346) that will slowly replace CDFS for CD-ROMs and will become the DVD-ROM format. Win2K includes UDF support with the \%system root%\system32\drivers\udfs.sys file system driver, which lets Win2K access DVD-ROM file-system data, which will become useful as DVD-ROMs proliferate and replace CD-ROMs.

Terminal Services
To support multiple interactive user sessions through thin-client connections, the Win2K kernel incorporates the kernel changes that Microsoft implemented in NT Server 4.0, Terminal Server Edition (WTS). These changes require the kernel to support the concept of a session (in which a session includes a private copy of the Win32 kernel subsystem, graphics drivers, and input devices) for each user connected to the server. In addition, in user mode, each session has a copy of the logon process (i.e., Winlogon) and the Win32 user-mode subsystem (i.e., csrss.exe).

By specifying that each user associate with a complete desktop state, Win2K can implement a multiuser environment with minimal changes to the kernel architecture, device drivers, and user-mode applications. Microsoft incorporated many other changes in Win2K to make terminal services work, including adding RDP device drivers and enhancing the Object Manager kernel subsystem's naming scheme to specify kernel objects that are local to a session or global to the system.

More than 90 percent of Microsoft's work in making Win2K support multiple interactive user sessions involved changes to the core memory manager. By tightly integrating this support in the core, most drivers and user components in Win2K work without needing to be aware that multiple sessions might exist.

The End of the Tour
As you've seen, Microsoft leaves much of the Win2K kernel unchanged from NT 4.0. The Process Manager, Security Manager, Cache Manager, and I/O Manager, for example, enter Win2K from NT 4.0 without alteration. However, all the kernel subsystems in Win2K are performance-tuned, and some have significant new functionality. In addition, Win2K includes the new PnP Manager and Power Manager subsystems. All of the changes in Win2K fully round out the foundation that Microsoft built with NT 4.0.