Introduction

PAE is an Intel-provided memory address extension that enables support of greater than 4 GB of physical memory for most 32-bit (IA-32) Intel Pentium Pro and later platforms. This article provides information to help device driver developers implement Windows drivers that support PAE.

* Total physical address space is limited to 4 GB on these versions of Windows.

PAE is supported only on 32-bit versions of the Windows operating system. 64-bit versions of Windows do not support PAE. For information about device driver and system requirements for 64-bit versions of Windows, see 64-bit System Design.

Although support for PAE memory is typically associated with support for more than 4 GB of RAM, PAE can be enabled on Windows XP SP2, Windows Server 2003, and later 32-bit versions of Windows to support hardware enforced Data Execution Prevention (DEP).

Operating System Support. The PAE kernel is not enabled by default for systems that can support more than 4 GB of RAM.

To boot the system and utilize PAE memory, the /PAE switch must be added to the corresponding entry in the Boot.ini file. If a problem should arise, Safe Mode may be used, which causes the system to boot using the normal kernel (support for only 4 GB of RAM) even if the /PAE switch is part of the Boot.ini file.

The PAE kernel can be enabled automatically without the /PAE switch present in the boot entry if the system has DEP enabled (/NOEXECUTE switch is present) or the system processor supports hardware-enforced DEP. Presence of the /NOEXECUTE switch on a system with a processor that supports hardware-enforced DEP implies the /PAE switch. If the system processor is capable of hardware-enforced DEP and the /NOEXECUTE switch is not present in the boot entry, Windows assumes /NOEXECUTE=optin by default and enables PAE mode. For more information, see the topic "Boot Options in a Boot.ini File" in the Windows DDK.

System Board Issues: DAC Capabilities for Buses

Various chipsets are capable of supporting more than 4 GB of physical memory. By using PAE, the Windows Datacenter and Advanced Server operating systems can use this memory.

On a 64-bit platform, for optimal performance, all PCI adapters (including 32-bit PCI adapters) must be able to address the full physical address space. For 32-bit PCI adapters, this means that they must be able to support the Dual Address Cycle (DAC) command to permit them to transfer 64-bit addresses to the adapter or device (that is, addresses above the 4 GB address space). Adapters that cannot provide this support cannot directly access the full address space on a 64-bit platform.

Unfortunately, Microsoft is finding that not all PCI buses on a system board support DAC, which is required for a 32-bit PCI adapter to address more than 4 GB of memory. Furthermore, there is no way for a DAC-capable PCI device (or its associated driver) to know that it is running on a non-DAC-capable bus.

Given these issues in hardware, Microsoft must find the optimal software workaround in the operating system from the standpoint of customers, OEMs, and Microsoft. This section discusses several possible software solutions that have been rejected due to various inadequacies, and then discusses the selected solution.

Overriding Dma64BitAddress = Performance and Stability ProblemsOne software workaround would require the operating system to override the Dma64BitAddresses flag passed by the driver to HalGetAdapter: if the driver passes TRUE and its device is on a bus that does not support DAC, force this to FALSE. That causes the hardware abstraction layer (HAL) to double-buffer transfers done through IoMapTransfer or GetScatterGatherList, so the device never sees an address above 4 GB. For more information, see the "Double-Buffer DMA Transfer" topic in the Windows DDK.

Unfortunately, HalGetAdapter does not have the necessary information to determine the bus of the caller's device. All that can be known is the contents of the DEVICE_DESCRIPTION structure that the driver provides, where the only relevant information is that the InterfaceType is PCI.

typedef struct _DEVICE_DESCRIPTION {

ULONG Version;

BOOLEAN Master;

BOOLEAN ScatterGather;

BOOLEAN DemandMode;

BOOLEAN AutoInitialize;

BOOLEAN Dma32BitAddresses;

BOOLEAN IgnoreCount;

BOOLEAN Reserved1;// must be false

BOOLEAN Dma64BitAddresses;

ULONG BusNumber;// unused for WDM

ULONG DmaChannel;

INTERFACE_TYPEInterfaceType;

DMA_WIDTH DmaWidth;

DMA_SPEED DmaSpeed;

ULONG MaximumLength;

ULONG DmaPort;

} DEVICE_DESCRIPTION, *PDEVICE_DESCRIPTION;

In addition:

•

Double-buffering has been shown in testing at Microsoft to have a negative performance impact on I/O throughput and CPU utilization. This negative impact increases as more memory is added beyond 4 GB.

•

The delays associated with high-performance I/O and double-buffering might cause timing issues for drivers and devices, which would negatively impact system stability.

All of this is contrary to the goals of Windows Datacenter and Advanced Server to ensure increased scalability and reliability.

IoGetDmaAdapter = Not Used by All DriversWindows Driver Model (WDM) introduced the call, IoGetDmaAdapter(), which is similar to HalGetAdapter, but also takes a pointer to the physical device object (PDO). This allows the operating system to detect the caller's PCI bus and whether it is a child device of a non-DAC bus. Then the PCI driver could override the Dma64BitAddress field in DEVICE_DESCRIPTION so that the HAL thinks the device can handle only 32-bit addresses.

The problem with this approach is that not all drivers use IoGetDmaAdapter. Many still use HalGetAdapter, even though the recent DDKs specifically define this as an obsolete call. Microsoft has no way of preventing third-party drivers from calling HalGetAdapter; forcing drivers to use IoGetDmaAdapter by failing the call would render many otherwise capable drivers as no longer functional. Requiring all drivers to use IoGetDmaAdapter would create enormous test and performance issues.

Incorrect or No DMA Routines = No Possible WorkaroundRegardless whether IoGetDmaAdapter or HalGetAdapter is used, not all drivers use the DMA routines correctly. Some do not use them at all because of the performance impact. It would be no surprise to find 64-bit capable drivers that ignore the DMA routines because they "know" they do not need such routines. In such cases, there is no possible operating system workaround--all offending drivers would have to be found and fixed.

Boot Device on Non-DAC Bus / All Non-DAC Buses = No Large Memory SupportBesides the many problems of non-DAC buses described above, there are two special cases:

•

Boot Device on Non-DAC Bus. The first is the case where the boot device is on a non-DAC bus. Given that a pagefile usually resides on the boot device and this is a primary data path, then all pagefile I/O would be forced to be double buffered, negatively impacting system performance and possibly leading to system instability.

•

All Non-DAC Buses. The second case is where all buses are non-DAC, in which case the user has no option of moving DAC adapters, LME-capable adapters, or both to DAC buses. The only solution in such a case is to limit memory support to 4 GB, regardless of whether the processor, memory controller, or system board physically support more than 4 GB of RAM.

Microsoft does not expect any instances of the second case and few of the first, but must take these possibilities into account in defining the overall solution.

Selected Solution: Disable Memory Above 4 GB when Non-DAC Buses ExistBecause Microsoft has no reliable software workaround for this problem, the only viable alternative is to disable configurations that do not work and inform the administrator of the problem. Disabling all memory above 4 GB if there are any non-DAC buses is one way to prevent instability in this case. Microsoft thinks that this is the best solution for customers, because it is the least likely to destabilize the platform.

In these cases, there is still an opportunity for memory corruption, even with 32-bit devices on 32-bit buses using 32-bit drivers, as described in Note 2.

2

The decision to double-buffer is made on a per-transfer basis. It is the same algorithm used to determine whether a DMA transfer to a 24-bit (ISA) adapter should be double buffered.Double buffering occurs for a given transfer if the physical address of the DMA memory is at an address higher than the adapter can reach. Previously, an adapter that could access all 32 bits of physical address space would set the Dma32BitAddresses field in the DEVICE_DESCRIPTION structure passed into HalGetAdapter. Similarly, an adapter that could access all 64 bits of physical address space would set the Dma64BitAddresses field in the same structure.If a buffer with a physical address greater than 4 GB is passed to IoMapTransfer, the adapter object is examined. If it is found to be for an adapter that did not set the Dma64BitAddresses field, then a suitable low-memory buffer is found, and the data is copied before or after the transfer (depending on whether the data was going to or coming from the adapter, respectively).

3

Systems having a non-DAC bus are detected at boot time and Windows disables memory above 4 GB by not using the PAE kernel in order to prevent memory corruption and system instability.

4

In these cases, there is still an opportunity for memory corruption, even with 32-bit devices on 32-bit buses using 32-bit drivers, as described in Note 2.

If you, the IxV, have and test user-mode code [and this is almost universally true] this is a test scenario you must cover. You must make sure that the code you are testing can deal correctly with high virtual addresses, especially above 2 GB. Windows should be tested with your applications or utilities to ensure they work.

Usually, VirtualAlloc returns virtual addresses in low -> high order. So, unless your process allocates a lot of memory or it has a very fragmented virtual address space, it will never get back very high addresses. This is possibly hiding bugs related to high addresses. There is a simple way to force allocations in top -> down order in Windows Server 2003, Datacenter Edition and Enterprise Edition operating systems and this can reveal important bugs.

You need to set HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\AllocationPreference REG_DWORD = 0x100000

All applications, but especially those applications that are built with LINKER_FLAGS=/LARGEADDRESSAWARE in the sources file, should be tested with the /PAE, /NOLOWMEM and /3GB switches, and the registry change.

MEM_TOP_DOWN can also be used on Itanium-based systems. However, /3GB is an x86-specific feature.

The /PAE, /NOLOWMEM and /3GB switches, and the registry change, can be used at once. Note that /3GB will prevent access to physical memory beyond 16 GB because the kernel memory space is reduced with the /3GB switch, and thus does not have enough room for the additional Page Table Entries required when memory is larger than 16 GB.

Adapter and Driver Issues: LME and DAC Capable

All physical memory is treated as general-purpose memory, so no new APIs are needed to access I/O above the 4 GB physical memory address. Also, direct I/O can be done for greater than 4 GB physical addresses--this requires DAC-capable or 64-bit PCI devices. Devices and drivers that can perform direct I/O beyond 4 GB are considered Large Memory Enabled (LME).

Because Windows does not have a kernel PAE or LME API or interface, the PAE-X86 kernel ensures that many items are identical to the standard kernel, including:

•

Kernel memory space organization is unchanged.

•

PCI Base Address Registers [BAR] remains the same.

•

Registry flags work the same.

•

Non-paged pool size remains the same.

•

3GT feature is supported for up to 16 GB RAM.

•

IMAGE_FILE_LARGE_ADDRESS_AWARE continues to work.

•

"Well known" kernel addresses remain in the same locations.

However, careful device driver development is still required. Hardware devices should be DAC-capable or 64-bit capable with LME drivers; otherwise, the device will function as "legacy" 32-bit and will be double buffered, with lower relative performance.

Although double-buffering can have a relatively small impact (single percentage points) on 8 GB systems, this is enough to impact I/O intensive tasks such as database activity. This is also dependent on a number of factors beyond Microsoft's control, such as hardware design and device driver optimizations like interrupt moderation and efficient use of the PCI bus. As the amount of physical memory increases, so does the negative performance impact in comparison to DAC/64-bit devices and LME drivers.

General Guidelines for LME Drivers

Do not use PVOID64. Using PVOID64 anywhere will return incorrect information, because this call does not return valid information on the Intel Architecture platform. Instead, use PHYSICAL_ADDRESS.

Note: This does not apply for NDIS miniports. Also, miniports, such as USB and others that are relatively low performance, need not be rewritten to be LME, because the performance gain or loss is not significant. These miniports, however, should correctly use the kernel interfaces for Windows and not try to trick the operating system by use of undocumented and unsupported shortcuts.

•

Do not call MmGetPhysicalAddress() on a locked buffer, discard the high 32 bits, and then program the adapter to DMA into the resultant address. This will certainly result in corrupted memory, lost I/O, and system failure. If this call is made, ensure that all address information returned is used and that the driver correctly operates with that 64-bit address.

•

Do not use PVOID when manipulating physical addresses. Because PVOID is only 32 bits, address truncation will take place and memory corruption will result.

•

Do not use ULONG when manipulating physical addresses, because this has exactly the same precautions and behavior as PVOID: system failure.

•

Do not indicate support for scatter/gather in the DEVICE_DESCRIPTION when not true in an attempt to avoid the buffering provided by HAL (the "mapping registers").

•

If the driver cannot support 64-bit addresses, do not call IoMapTransfer(...) without having an AdapterControl(...) function (again, to avoid mapping registers), and do not supply zero as the value for MapRegisterBase. This will fail.

Other functions and calls might cause failures. Information is provided in the Windows DDK.

Guidelines for NDIS Miniports on PAE Systems

Use the NDIS deserialized miniport driver model.Miniports should be deserialized for optimum performance on Windows Datacenter and Advanced Server operating systems.

See the entry for NdisMInitializeScatterGatherDma in the DDK documentation.

General Guidelines

The following should be noted for NDIS miniports:

•

Shared memory allocated using NdisMAllocateSharedMemory is guaranteed not to cross a 4 GB boundary.

•

NDIS_PER_PACKET_INFO_FROM_PACKET(ScatterGatherListPacketInfo) will never return NULL for miniports that support scatter/gather DMA.

•

The physical address range indicated by SCATTER_GATHER_ELEMENT will not cross a 4-GB boundary. If a virtual memory buffer does cross a 4-GB boundary, it will be broken into two scatter/gather elements.

Guidelines for 32-Bit Address-Only Network Devices

The following guidelines are recommended for 32-bit address-only network devices:

•

Properly written NDIS drivers will work as-is on PAE systems, but will have a significant negative performance impact that grows as the amount of installed RAM increases.

•

NdisMStartBufferPhysicalMapping will copy all fragments above the 4 GB address space to memory that is below the 4 GB mark.

Guidelines for 64-bit Address-Capable SCSI Miniports

(including all related adapters for SCSI 2)

The following guidelines are recommended for 64-bit address-capable SCSI miniports:

•

Miniports need to support scatter-gather DMA. They must not call any of the slave-mode DMA routines: ScsiPortFlushDma or ScsiPortIoMapTransfer.

•

Miniports should check the value of Dma64BitAddresses in PORT_CONFIGURATION_INFORMATION to determine whether 64-bit physical addresses are supported. If 64-bit physical addresses are supported, the miniport should change its extension sizes to account for the larger physical addresses (if necessary) and set the Dma64BitAddresses field to SCSI_DMA64_MINIPORT_SUPPORTED before calling ScsiPortGetUncachedExtension.

•

Miniports must not attempt to access data buffers using virtual addresses unless they have set the MapBuffers bit in the PORT_CONFIGURATION_INFORMATION structure. The exceptions to this rule are INQUIRY and REQUEST_SENSE operations that will always have a valid virtual address.

•

Use SCSI_PHYSICAL_ADDRESS to access all physical addresses.

•

Uncached extensions and SRB extensions will not cross the 4 GB boundary.

•

No scatter/gather element will cross the 4 GB boundary.

Guidelines for Legacy SCSI Miniports

The following guidelines are recommended for legacy SCSI miniports:

•

Miniports need to support scatter/gather DMA. They must not call any of the slave-mode DMA routines: ScsiPortFlushDma or ScsiPortIoMapTransfer.

•

Miniports must not attempt to access data buffers using virtual addresses unless they have set the MapBuffers bit in the PORT_CONFIGURATION_INFORMATION structure. The exceptions to this rule are INQUIRY and REQUEST_SENSE operations that will always have a valid virtual address.

•

Miniports should not set the MapBuffers bit unless absolutely necessary, because providing valid virtual addresses to a 32-bit driver on a LME system is costly.

Test using the MEM_TOP_DOWN registry settingThis forces all allocations for memory to be allocated from the top down, instead of the normal bottom up.Set HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\AllocationPreference REG_DWORD = 0x100000

Troubleshooting DAC Support and LME Drivers

The following checkpoints can help OEMs, IHVs and customers determine whether there are any issues relating to the system board, buses, or adapters in supporting DAC and LME.

•

If the driver fails at initialization, check with the system OEM to determine whether all PCI buses present in the system support DAC.

•

If a network adapter driver performs a bug check immediately upon a network connection, determine whether all buses support DAC, again by checking with the OEM.

•

If the PCI buses on the system are all DAC capable, check whether the hardware device is compliant with PCI 2.1.

•

If the bus supports DAC and the device is PCI 2.1 compliant, check the driver for assumptions being made about physical addresses.

Documentation is provided in the Windows DDK about PAE memory support. A summary of the information for customers is included in the following sections.

Hardware Requirements for PAE

The system must meet the following minimum requirements:

•

x86 Pentium Pro processor or later

•

More than 4 GB of RAM

•

450 NX or compatible chipset and support, or later

Enabling PAE

To enable PAE:

•

Locate the Boot.ini file, which is typically in the root folder (for example, C:/) and remove its Read-Only and Hidden attributes.

•

Open the Boot.ini file with a text editor, and then add the /PAE parameter to the ARC path, as shown in the following example:

multi(0)disk(0)rdisk(0)partition(2)

\WINNT="Windows ???? Datacenter Server" /PAE /basevideo /sos

•

On the File menu, click Save.

•

Restore the Read-Only attribute to the Boot.ini file.

Troubleshooting Specific Programs

Following are two examples of problems that might occur, with solutions that will rectify the problem.

Problem: The computer will not start after PAE is enabled.

Cause: Your hardware may not support PAE.

Solution: Start the system and run Safe Mode, which disables PAE. Then remove the /PAE parameter from the Boot.ini file.

To run Safe Mode:

1.

When you see the message "Please select the operating system to start," press F8.

2.

Use the arrow keys to highlight the appropriate Safe Mode option, and then press ENTER.To use the arrow keys on the numeric keypad to select items, NUMLOCK must be off.

Problem: After PAE is enabled, the computer runs for a time and then displays a Stop error.

Cause: Your hardware may not support PAE.

Solution: Contact your hardware vendor for a driver update. If your hardware or driver is not capable of supporting PAE, disable PAE by removing the /PAE parameter in the Boot.ini file. If you must disable PAE but your system processor supports hardware-enforced DEP, add /NOPAE /NOEXECUTE=alwaysoff to your Boot.ini file. Note: This will disable the DEP feature on your computer.