Windows Memory Management

Introduction

What is Windows Memory Management? - Overview

Microsoft has, as of operating system versions Vista SP1 and Windows Server 2008, implemented new technologies, for both resource allocation and security. Several of these new technologies include the Dynamic Allocation of Kernel Virtual Address Space (including paged and non-paged pools), kernel-mode stack jumping, and Address Space Layout Randomization. Basically, the allocation of resources are not fixed, but are dynamically adjusted according to operational requirements. The implementation of these new technologies such as Address Space Layout Randomization are mostly due to the hacker threat of an advanced knowledge of the location of key system components (such as kernel32.dll, ntdll.dll, etc), and are partly due to the Window’s goal of using memory allocation more efficiently by allocation on an as needed basis. In order to understand these new technologies better and be able to use them as a developer, device driver writer, or system’s administrator, this paper will focus on the Windows Memory Manager prior to Vista SP1.

How Does the Windows Memory Manager Work?

The purpose of this paper is to therefore give a conceptual understanding to those who have struggled with memory management as a whole and to explain why these newer technologies have evolved. It will start with a general view of the Windows Memory Manager, to get more specific as to how Windows manages used and unused memory. To illustrate how memory works, tools from the TechNet SysInternals web sites will be described for memory leaks. The paper will conclude with a brief description of paging lists.

The OS Maps Virtual Addresses to Physical Addresses.

Because the virtual address space might be larger or smaller than the physical memory on the machine, the Windows Memory Manager has two first-in-rank responsibilities. The first is to translate, or map, a process's virtual address space into physical memory so that when a thread is running in the context of that process reads or writes to the virtual address space, the correct physical address is referenced.

The second one is paging some of the contents of memory to disk when it becomes overcommitted. That is, when running threads or system code try to use more physical memory than is currently available-and bringing the contents back into physical memory as needed.

One vital service provided by the Memory Manager includes memory-mapped files. Memory-mapping can speed-up sequential file processing due to the fact the data is not sought randomly, and it provides a mechanism for memory-sharing between processes (when they are referencing the same DLL, as there must be only one instance at a time of any DLL.

Most virtual pages will not be in physical memory (RAM), so the OS responds to page faults (references to pages not in memory) and loads data from disk, either from the system swap file or from a normal file. Page faults, while transparent to the programmer, have an important impact on performance, and programs should be designed to minimize faults (again, if the data was stored in register, this would prevent reading data from a file on disk or locating data stored in memory to then read that data to write to the system memory address bus connected to the CPU). The wizards from SysInternals contend that the concern is not that there is one process that is hard page faulting, but rather a collection of them hard page faulting. This hard page faulting causes the system to thrash and is a clear indication the system needs more memory.

Dynamic memory allocated in heaps must be physically in a paging file. The OS’s memory management controls page movement between physical memory and the paging file and also maps the process’s virtual address to the paging file. When the process terminates, the physical space in the file is deallocated.

Windows provides an illusion of a flat virtual address space (4GB), when in reality, there is a much smaller amount of physical memory. The hardware memory management unit of today’s microprocessors provides a way for the OS to map virtual addresses to physical address and it does this in the granularity of a page. The Windows Memory manager implements a demand paged virtual memory subsystem which is another way of saying that it is a lazy allocator. In other words, if you launch an application such as Notepad, it does not launch the entire application and appropriate DLLs into physical memory. It does so as the application demands: as Notepad touches code pages, as it touches data pages, it's at that point where the memory manager will make a connection between virtual memory and physical memory, reading in contents off disk as needed. In short, it is a common misconception that the memory manager reads the entire executable image off of the disk. An example of this can be illustrated using process monitor and setting the filter to something that has been run since a reboot, say, solitaire. After launching solitaire, solitaire is on the disk. Solitaire, as it starts up, is causing page faults, reading pieces of its own executable off of the disk on demand. When you stop the logging of the trace-gathered information and look, you will see an example of a process, sol.exe, reading sol.exe: it is reading itself, faulting itself onto disk.

As features of Solitaire are used, you will see sol.exe reading various DLLs, as those DLLs are being virtually loaded -- only the pieces being read are being loaded. Another component of the Windows Memory Manager is memory sharing. For instance, if you have two instances of Notepad, the common misconception is that there are two copies of Notepad and associated DLLs loaded into physical memory. The Windows memory manager will recognize that is a second instance of Notepad, an image that already has pieces of it in physical memory and will automatically connect the two virtual images to the same underlying physical pages. The important part of process startup and applications can take advantage of that and share memory.

On 32 bit Windows, 2 GB for each process (user), and 2 GB for the system. Just like applications need virtual memory to store code and data, the operating system also needs virtual memory to map itself, the device drivers that are configured to load, and also to store the data that is maintained by the drivers and the OS (kernel memory heaps).

Tools that indicate memory usage often show virtual memory, physical memory, and the working set. The virtual memory counter does not offer a lot of information when troubleshooting memory leaks; virtual memory is used to map the code and data of an application, and an amount that is kept on reserve; that is, most virtual pages will not be in physical memory, so the OS responds to page faults (references to pages not in memory) and loads data from the disk. Therefore the virtual memory counter is not effective when troubleshooting. The private bytes counter indicates the number bytes of private memory that is private to a process -- it cannot be shared with another process. For instance, if you launch Notepad and start typing in text, no other process is interested in that data, so it is private to that process.

What About Memory Leaks?

How do we determine if we have a memory leak and if so, how do we further determine if it is a process leaking the memory, or if it is in kernel-mode, etc. In Task Manager, there is a memusage counter that is often used to trace source of a leaker. But the memusage counter does not actually indicate the private virtual memory for the process. The private bytes counter in task manager would actually be the virtual memory size. This misleads many who would assume that the virtual memory size would indicate that this is the amount of the virtual address space allocation. For reasons such as this, it is better to gather data by using Process Explorer, a freeware utility written by Mark Russinovitch. This tool uses a device driver to extract all of the relevant system information applicable to the version of Windows that you are running and contains a colorful and itemized display of processes, thread activity, CPU usage (perhaps by threads running that are not accounted for that are consuming CPU clock cycles), and all of the other needed counters available in the actual Windows Performance Monitor counters. Three columns needed, particular to this context, are the private bytes, the private delta, and the private bytes history counters that can be found in the “select columns” choice of the View menu tool bar. Process Explorer shows the “process tree” view in order to reveal which processes are running as child processes under the control of a parent process. The differences are reflected in the colors shown in the logic of the user interface. Pink colors indicate service host processes (svchost.exe) that run off of the Services.exe process. The light blue color shows the processes that are running under the same account as the user, as opposed to a SYSTEM or NETWORK account. The brown color shows processes that are call jobs, which are simply a collection of processes. These counters can be dragged and dropped to show the private bytes column next to the private bytes delta (in which if a negative number pops up means that a process is releasing memory), and the private bytes history.

If there is a process leak, it will not be related to the Task Manager memusage counter. The private bytes, the private bytes delta, and the private bytes history, are counters that can be set to examine private virtual memory to determine if it is a process is leaking memory. A case in point is that a process can be using an enormous amount of virtual memory, but most of that could actually not be in use, but kept on reserve. But the private bytes history counter column shows you a relative comparison of private bytes usage in a process with respect to all other processes running in the system. To examine this, download the SysInternals tool TestLimit.exe (or TestLimit64.exe if it is a 64 bit system you're running).The ‘m’ switch on this tool will leak the amount of specified private bytes every one half a second. That is, if you type c:\windows\system32> testlimit –m 5 you are leaking 10 MB of private bytes per second. With Process Explorer open and the private bytes, the private bytes delta, and the private bytes history (a weighted graph indicated by the width of the yellow) counter columns in view, you would see a growth in the cmd.exe process that would depart from a flat yellow line and approach a thick yellow line in the private bytes history counter column. The private bytes delta column would not show a negative sign to the left of any numerical figure, but only a positive number to indicate that it is not releasing any memory. Control-C the testlimit program and the memory is recycled back to machine.

What would happen if we did not Control-C (terminate) the process? Would it have exceeded the amount of virtual address space allocated? It would, in fact, be stopped sooner than that by reaching an important private bytes limit called the “commit limit”. The system commit limit is the total amount of private virtual memory across all of the processes in the system and also the operating system that the system can keep track of at any one time. It is a function of two sizes: the page file size(s) (you can have more than one) + (most of) physical memory.

The amount of physical memory that the operating system assigns to each process is called its working set. Every process starts out with an empty or zero-sized working set. As the threads of the process begin to touch virtual memory addresses, the working set begins to grow. When the operating system boots up, it has to decide how much physical memory will be given with respect to each process, as well as how much physical memory it needs to keep for itself to both store cached data and to keep free.So the sizes of the working sets of individual processes are ultimately determined by the Windows Memory Manager. The Windows Memory Manager monitors the behavior of each process and then determines the amount of physical memory based on its memory demands and paging rates. In effect, the Windows Memory Manager decides if a process needs to grow or shrink, while trying to satisfy all of these process’s demands as well as the demands of the operating system itself.

The above would indicate that application launching would be a time-consuming operation. As of Windows XP, Windows introduced a mechanism to speed up application launching called the logical prefetcher. Windows monitors the page faults (recall that during application start-up, the process reads itself, faulting itself, reading pieces of its own executable, off of disk on demand) during application start-up, and further defines the start-up as the first ten seconds of an application's activity. It saves a record of this information in a prefetch folder that resides in the Windows directory. Deleting these files would only harm system performance because these .pf files were written by a system process and the data was extracted from the kernel. So in terms of the working set, the Task Manager shows this working set with the memusage counter. Process Explorer shows the current and peak working set numbers in a separate counter. The peak working set is the most physical memory ever assigned to a process.

Windows automatically shares any memory that is shareable. This means the code: pieces of any executable or DLL. Only one copy of an executable or a DLL is in memory at any one time. This also includes an instance Terminal Server. If several users are logged on and they are using Outlook, one copy of Outlook is read off of disk (or the demanded pieces of it) to be resident in memory. If one user starts using other features of Outlook, they are read off on demand. If a second user starts using those same features, they are already resident in memory. This apart from parts of the executable and the DLLs, file data is also shared. To reiterate, it is not the memusage or the working set counters that are of help in memory leaks. The working set will grow, and then the Memory Manager will decide that it has gotten too big and shrink it. A quick fix is to sometimes add more memory. If one process is “hard page faulting”, that is not an indication that the system needs more memory, although there is a performance impact. If a collection of processes begin to excessively hard page fault, then this is a clear indication that the system needs more memory.

Why memusage Working Set Columns are not Memory Leaker Indicators

If the working set keeps growing and it reaches a point, the Windows Memory Manager will block that growth because it has decided that this working set is too big, and there are other consumers of physical memory. If, at this point, the process begins to leak virtual memory but is not physically using any more memory, the Memory Manager begins to reuse the physical memory to store new data that it might be referencing through the newly allocated virtual memory.

The working set is growing as the threads are touching virtual address spaces and the process is touching different pages that have been brought into the working set; at some point the Memory Manager says enough to that process; that there others that need just as much as you do. So as a process requests a page, the memory manager takes away a page, and, obviously takes the oldest pages away first. That is, pieces of the working set that have not been accessed for the longest time are pulled out. When those pages are pulled out, they are not overwritten, zeroed out, or destroyed, because they do represent a copy of data that was once being used by this process. So Windows keeps those on several paging lists.

To understand the performance counters so as to use them and determine if your system needs more physical memory, it is necessary to delve into the internals of how the Windows organizes the memory that is not currently owned by a process. This is memory that is not in a working set. The way that the Windows Memory Manager keeps track of this is that it keeps track of this unassigned memory in one of four paging lists. These unowned pages are organized by type:

Free page list

Modified page list

Standby page list

Zero page list

It is necessary to start out with the modified and standby page list first. When the Memory Manager pulls a page out of a process's working set, it is pulling out a page that the process may still need. It may have to be reused by that process; it may (being on the standby or modified page list) represent code or a DLL of an image and be reused by another process. The list that the page goes to depends on whether or not the page has been modified or not. If the page gets written to, then the Memory Manager has to ensure that the page gets written back to the file that it came from. That file might be a file that came from disk, such as a data file that is mapped into the process's address space. If the process modifies that page and it gets removed from the processes working set, then the Memory Manager has to make sure that page makes it back to that file on disk that was being modified. If the file has been modified but does not represent data mapped into the virtual address pace, then it may represent private data to that process that it might want to use again.

Pages that have not been modified go to the standby list. The modified page list is called the "dirty" list and the standby page list is called the “clean" list. After pages have been written to disk, those pages move from the modified list to the standby list.

The pages on the modified or standby list that are brought back into the working set are called soft faults -- not paging file reads or mapped file reads -- because there is no disk I/O. If the data being referenced is no longer in memory because it is back on the file on disk or back on the paging file, then the system would incur a hard fault and have to do a paging read operation and bring it back into memory.

The free page list doesn't exist when the system boots and only grows when private memory is returned to the system. Private memory would be a piece of a process address space such as a buffer that contains the text that you have typed into Notepad. When Notepad exits, whether you have saved that data or not, the memory inside the Notepad process address space that contains that private memory is returned to the free list. For example, if you launch Notepad and start typing text, that data is not usable by any other process. The keystrokes are buffered; so other process is interested in that data saved or not. So that memory is returned to the free page list. Private process memory is never reused without first being zeroed.This free page list is where the Memory Manager goes when it needs to perform a page read. When a page fault is occurring, the Memory Manager is going to an I/O that is going to overwrite the contents of the page completely. So when the Memory Manager has a page fault and it needs to find a free page to read a piece from the file in from the disk, it goes to the free list first (if there is anything there).

When, however, the free page list gets to be a certain size, a kernel thread called the zero page thread is awakened (this thread is the only thread that runs at priority 0). Its job is to zero out those dirty pages so when Windows needs zeroed pages, it has them at hand.

References

The Sysinternals Video Library: Troubleshooting Memory Problems, by Mark Russinovitch and David Solomon

Windows Internals 4th Edition written by Mark Russinovitch and David Solomon

If you can be talked into doing a part 2, I'd love to see info on measuring performance impact with page fault delta, how Windows handles two page files, or any detailed tool advice (the PSTool set from SysInternals for remote analysis, the more detailed uses of Process Explorer and maybe even Process Monitor etc.).

Also, any comments on automatic leak detection tools like the venerable Bounds Checker? I know I have my opinions; I'm always interested in others.

As a random "hey, this also works," I've done leak detection, especially on long-running systems, by making a process dump (or even a minidump) at a set interval and analyzing those. It's especially useful (if tedious) for isolating what event triggered the leak to occur or start occuring.