1. CR3 -> PML4 physical base, then multiply by 4k2. Now select a PML4 entry: PML4[L.PML4] -> Takes us to a 4k page structure (the PDPT)3. Given the PDPT (which has 512 entries) select a page directory:PDPT[L.DirectoryPtr] = PageDir4. Now PageDir is another 4k page structure that has 512 entries. Each entry points to a page table5. Select the page table: PageDir[L.Directory] = PageTable6. Now again the page table has 4k of size and 512 PTE entries, we select a PTEPTE = PageTable[L.Table]

So in the case of 4k pages, it is better to allocate each 4kb structure on demand.

Every entry in the PML4 points to the location of a PDPTEvery entry in a PDPT points to the location of a PD (or a 1G page)Every entry in a PD points to the location of a PT (or a 2M page)Every entry in a PT points to a 4k page.

To map 4k of the address space, you need one PT, one PD to refer to that PT, one PDPT and one PML4, which requires 16k of space, if you want to map everything using 4k pages, each table refers to 512 subtables: (1 + 512*(1 + 512*(1 + 512 * (1)) * 4096 = 0x0000_0080_4020_1000 bytes, just a little over 513GB

In any case, precreating address spaces cost more RAM than virtually any application that will be using that space.

_________________"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie[ My OS ] [ VDisk/SFS ]

(Sorry if this was more than you asked for, but some time ago I found that I need to research all of this to understand the topic; ‘**’ is the power operator)

lallous wrote:

I want to make sure that I understood %topic% properly.

To really grok this whole paging thing, you need to understand why it was created. The very-first time-sharing computer, MIT's Compatible Time-Sharing System (CTSS), faced the problem of protecting the supervisor (kernel) from the rest of system user processes. They did it by statically reserving the low 5-Kbytes of system memory to the kernel, and the rest 27-Kbytes to user applications. Afterwards, they wanted to avoid memory allocation clashes between users, but due to hardware limitations of the time, they solved that problem by allowing only one program in core memory at any point in time! [1]

As a solution to the “protect users from each other” problem, the ATLAS computer pioneered abstracting the concept of an “address” from the low-level detail of a physical memory location. Thus, from a memory perspective, each process will have its own world view (and no need for historical tricks like memory overlays: demand paging automatically does a similar effect by only loading parts of the executable that are currently needed; a process image can thus be bigger than the amount of contiguous free physical memory available). Now, If we are going to map every byte of the process address space, we'll need a mapping table with the size of physical memory or larger! To solve this, we group physical memory addresses into “blocks”, and map these “blocks” instead of singular bytes in the mapping table. [2]

(NOTE-1: the whole difference between segmentation and paging is that the former divides the address space into blocks of arbitrary size, while the latter organizes such space into uniform block sizes.)

Now, imagine having a 32-bit address space while mapping “blocks” of 4-Kbytes in size. Each mapping table entry will equal 4-bytes to be able to point to the whole 32-bit space. Since each entry maps 4-Kbytes (2**12) of address space, we'll need (2**32/2**12 = 2**20) entries to map the entire 4-Gbyte/32-bit space. So, the mapping table size will equal ((number of needed entries * entry size) = 2**20 * 2**2 = 2**22 bytes = 4-Mbytes) to function. i.e. using that simple scheme, we'll need a 4-MByte mapping table for each process in the system! I'm sure you won't like to use any kernel like that

(NOTE-2: Hardware designers intentionally choose power-of-2 page sizes to quickly find a virtual address page number and offset using right and left shifts. Otherwise, they'll have to use the too-slow division operator; check the cited Denning paper for further details.)

If we are going to analyze why each process needed a 4-Mbyte mapping table above, we'll find that the key problem was mapping unused parts of the address space. And to further complicate matters, most processes use their memory space in a very sparse manner; e.g. using the very top for the stack, and the very-bottom for the executable code and data. Thus, the paging scheme has to support virtual addresses moving up and down.

Hardware architecture books describe several solutions to the above problem. For the contemporary x86 CPUs, the solution chosen was to page the page-table itself. Intel and AMD64 documents cover the details.

(I'll just call the PDP and PD as ‘PML3’ and ‘PML2’ respectively, it makes more sense that way)

Because the AMD designers wanted to limit requiring contiguous physical memory as far as possible, each page table was designed not to exceed 4-Kbytes of contiguous RAM. Since each entry here points to a full 64-bit address, such page table entry sizes will equal 8-bytes. Thus, max number of possible entries in a table = (page table size / entry size = 2**12 / 2**3 = 2**9) entries. That's in fact why the PML4, PML3, and PML2 offsets are 9-bits each and why all of these tables also have 512 entries each. This leads us to:

A PML2 entry covers a 2-Mbyte region since it directly points to a 2MB page

A PML3 entry covers a 1-Gbyte region since it points to 512 PML2 entries

A PML4 entry covers a 512-Gbyte region since it points to 512 PML3 entries

Who is online

Users browsing this forum: Bing [Bot] and 13 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum