Monday, 16 December 2013

Virtual to Physical Address Translation (Part 2)

The second part is going to concern paging structure on x86 and x64, and how virtual memory addresses and physical memory addresses are mapped according to this structure. The third part will look at how physical memory is managed with the PFN database.

Hardware PTEs and Paging Structure

A virtual address on a x86 system, is divided into three different parts: Page Directory Index (10 bits); Page Table Index (10 Bits) and the Byte Index (12 bits). The above image shows their relationship in relation to the general page table structure.

The Page Directory Index shows the address of the page table in which the desired PTE is located. The Page Table Index indicates the address of the PTE within the Page Table, and the Byte Index is used to find the correct physical page for which the PTE is mapped to.

Before we go onto briefly explaining the x64 version, and going into greater depth about each part of the translation process, let's quickly discuss how to find the Page Directory address. Remember it is unique to each process, and each thread running under one process will inherit this address, meaning that a context switch will not need to be formed when changing the context to a different thread.

Using the !process extension, the DirBase field shows the physical address of the Page Directory. This same physical address is stored within the Control Register called CR3. Using the r command with the @ character, we can see that the two addresses are identical.

Alternatively, you could also check the _KPROCESS data structure, and investigate the address from that standpoint. The CR3 register (Page Directory Base Register) will updated with the address of a different Page Directory if a process context switch occurs.

The Page Directory is identically a large array of PDEs (Page Directory Entry), each PDE points to the address of a Page Table, and is 4 bytes long (or 32-bits). Page Tables are created on demand, and therefore the VAD Tree is checked to see wherever a new Page Table should be created upon access of a virtual address without a corresponding Page Table Index and Page Directory Index.

The PDE virtual address can be found with the !pte extension, as shown below:

Each PDE points to the address of a Page Table, in the Page Table is similar in respect that it is a array of PTEs (Page Table Entry). The Page Table Index is used to find the relevant PTE within the array. 1,024 Page Tables are required to map the entire 4GB of address space for x86.Each PTE then is used to point the relevant physical page, and the Byte Index is used to find the appropriate address within this page. The PTE has a number of different PTE Protection and Status bits associated with it, which I will explain here.

Accessed (A) - The page has been read.

Cache Disable (Cd) - Caching is disable for the page.

Copy-on-Write (Cw) - Page is using copy on write.

Dirty (D) - Page has been written to.

Global (Gl) - Translation applies to all processes.

Large Page (L) - PDE maps a 4MB page.

Owner (O) - Shows if the page is accessible in User-Mode or Kernel-Mode.

Prototype (P) - Prototype PTE, will be explained later.

Valid (V) - Virtual Page maps to a physical page.

Write Through (Wt) - Disables caching of writes.

Write (W) - Page is writable.

These flags can also be found with the _HARDWARE_PTE data structure:

PTE's can be subdivided into three other categories: Invalid, Prototype and System. The next few paragraphs will explain these types of PTE, and then conclude with the TLB.

System PTEs

System PTEs are used to map the system address space. For example, kernel stacks, MDLs and I/O is mapped with the use of System PTEs. We can see the amount of System PTEs free to use, by using the !sysptes extension in WinDbg.

You can also view the number of System PTE Allocation failures, which could indicate a PTE Leak by dumping the address of _MI_SYSTEM_PTE_TYPE data structure, by gathering the address of the MiSystemPteInfo global variable.

The above address is then used with the dt command, as seen below:

As we can see, there isn't any allocation failures which is a positive sign, and suggests everything is running normally. On the other hand, if you do notice any allocation failures, then you could create a certain registry key called HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\TrackPtes. By creating this DWORD key and setting it to 1, then you will be enabling the tracking of System PTEs. The next step would be to use the !sysptes extension with the 0x4 bit flag set.Invalid PTEsInvalid PTEs indicate that the PTE isn't accessible to the process, usually for the invalid PTE to become valid, a Page Fault exception is raised and the Page Fault is then resolved by the Memory Manager's fault handler called MmAccessFault. There are four different kinds of Invalid PTEs, which will be explained below:Page File - The page is located within a page file on the hard drive, accessing this page will result in a page fault, which will allocate a physical page, and enable the Valid bit for the PTE. The page will also be added to the Working Set of the accessing process.Demand Zero - The page will be written with a page of 0's, if this page is accessed then a zero filled page is added to the working set of the process. At first, the zero page list is checked, and if this is empty, then a page from the free list is taken and paged with 0's. Otherwise, the page is taken from a Standby List and paged with 0's.Transition - The page is currently on a standby, modified, modified-on-write or no list and therefore will removed from the corresponding list and added to the working set of the process.Unknown - PTE is zero, or there isn't a page table yet. This leads to the VAD Tree being checked to see if the page is committed, and if so, a page table will be created.Prototype PTEs

Pages will are shared between processes are mapped with Prototype PTEs. When a sharable and mapped page is referenced by a process, a hardware PTE is used to point to the referenced page, thus both the Prototype PTE and the hardware PTE point to the physical page.For each reference to a shareable page, a counter is incremented within the PFN Database. This allows the Memory Manager to invalidate any pages and move these pages to the hard-drive or a transition list.The PTE used by the process' page table has it's Valid flag cleared and is used to point to the Prototype PTE which points to the page.

If the page is later accessed, then the Prototype PTE can improve the lookup process. The diagram below illustrates this point, and how a Prototype PTE and Valid PTE may look in concept.

A Prototype PTE can used to used to describe the page state of a sharable page, these states are as follows:Valid - Page is in physical memory.Transition - The page is currently present on a standby or modified list, or may not be present on any list.Modified-No-Write - The page is present in physical memory, and present on the modified-no-write list.Demand Zero - The page will written with a page of 0's.

Page File - The page is present on a paging file

Mapped File - The page is present in a mapped file.

x64 Translation Process:

On x64 systems, the paging structure has expanded from two levels to four levels. The additional level or layer is called the Page Map Level 4. The Virtual Address on a x64 system therefore has more sections. I've created a diagram for the current 48-bit implementation.

The Page Map Level 4 Selector points to the Page Map Level 4, the Page Directory Pointer Selector then points to the Parent Page Directory Pointers Table. The Page Table Selector shows the Page Directory, and the Page Table Entry Selector then points to the correct PTE which maps to the physical page. The Byte Within Page points to the specific PFN. Remember the x64 paging structure still applies to each process. Using the above information, you can imagine the overall x64 paging structure being like the diagram below: