This page has a collection of notes that I made while working on the stack limit patches for Sony.
This was a set of patches that tried to map kernel memory areas as 4K pages, so that an individual
page of the stack could be unmapped, and a page fault generated, when stack space was running low.
(This was related to testing an implementation of 4K stacks on ARM).

These notes are placed here in the hopes that they will be useful for someone working on ARM memory
management.

Contents

startup memory

At start up the kernel (usually) automatically determines the physical memory areas. However,
the user can manually
specify one or more physical memory areas using "mem=..." kernel command line options. When used,
these override any automatically determined values. These are parsed by early_mem().

The initial description of physical memory regions are stored in the global 'meminfo' structure,
and each region is described as a 'bank'. Some initial physical memory is utilized by the bootmem allocator.
It is from this pool of physical memory, that the page tables are built, which allows the MMU to be turned
on, and the kernel switched over to virtual memory. Once this process is done, the bootmem pool is
freed and all system pages are turned over to the various page and slab allocators of the system.
A good reference for this is Mel Gorman's excellent book on the topic:
Understanding the Linux Virtual Memory Manager
The prior link is to the PDF, here's a link to html: http://www.kernel.org/doc/gorman/html/understand/
Chapter 5 talks about the bootmem allocator.

page tables

The page tables are, unsurprisingly, initialized by paging_init(). ARM uses a somewhat weird way of
mapping the Linux page tables onto the ARM hardware tables. This method is described in comments
in the file arch/arm/include/asm/pgtable.h, with additional macros defined in pgtable-hwdef.h and page.h.
Basically, Linux supports 4 levels (pgd, pud, pmd, and pte), and ARM maps this onto 2 levels (pgd/pmd and
pte). The nomenclature in the code is hard to follow, because Linux generic code thinks that pgd
is the top level of page tables, but internally the ARM code uses pmd macros to refer to the top hardware
page table.

Originally, Linux used 4KB mappings for ARM, but they have converted over to mostly 1MB mappings (at least for
the Linux kernel). According to my colleague, Frank Rowand, bad things happen if a physical page is
represented in the page table by more than one entry (for example, if a physical page has both an entry
as a small page in a second-level page table, and is inside a region covered by a large "section" page entry
in a first-level page table.

At the hardware level,
ARM supports two page table trees simultaneously, using the hardware registers TTBR0 and TTBR1. A virtual
address is mapped to a physical address by the CPU depending on settings in TTBRC. This control register
has a field which sets a split point in the address space. Addresses below the cutoff value are mapped through the
page tables pointed to by TTBR0, and addresses above the cutoff value are mapped through TTBR1. TTBR0 is
unique per-process, and is in current->mm.pgd (That is, current->mm.pgd == TTBR0 for that process). That is, when a context switch occurs, the kernel sets TTBR0 to the current->mm.pgd for new process. TTBR1
is global for the whole system, and represents the page tables for the kernel. It is referenced in the global kernel variable swapper_pg_dir. Note that both of these addresses are virtual addresses. You can find the physical address of the first-level page table by using virt_to_phys() functions on these addresses.

Note that in particular, the second-level page table (PTE) layout is weird. The ARM hardware supports 1K tables at this level (256 entries of 4bytes each). However, the Linux kernel needs some bits that are not provided on some hardware (like DIRTY and YOUNG). These are synthesized by the ARM MMU code via permissions and faulting, effectively making them software-managed. They have to be stored outside the hardware tables, but still in a place where the kernel can get to them easily. In order to keep things aligned into units of 4K, the kernel therefore keeps 2 hardware second-level page tables and 2 parallel arrays (totalling 512 entries) on a single page. When a new second-level page table is created, it is created 512 entries at a time, with the hardware entries at the top of the table, and Linux software entries (synthesized flags and values) at the bottom.

The kernel has a mixture of 1MB mappings and 4KB mappings. This relieves some pressure on the TLB, which
is used by the CPU to cache virtual-to-physical address translations.

paging_init()

The page tables and paging infrastructure are initialized as follows:

paging_init() is called by setup_arch() after the meminfo structure has been initialized and the bootmem allocator is ready. It calls the following routines:

memblock_set_current_limit()

build_mem_type_table() - builds a table of memory types. This has the page protection flags that are available for different memory types, for the current ARM processor. Different ARM processors have changed what flags they use, and where they are located in the page table entries, over the years. the 'mem_types' table encapsulates the settings for the running processor.

prepare_page_table() - this zeros out certain areas of the first-level page table (called pmd in this routine). For example, it zeros out the areas of the page table that will be covered by user-space (areas below the start of the kernel address space).

map_lowmem() - create the memory mappings (page table entries) for the lower portions of kernel memory. This is the "normal" memory that will be used by the kernel for static code and data, stack, and regular dynamic allocations.

pte - pointer to pte entry to change (the address of the linux entry - not the 'hardware' entry)

value - the value to place in the entry - this is usually created with the macro pfn_pte(x,y), where x has the page frame number (physical address of the page for this mapping), and y represents the protection flags for the page.

value2 - the value to place in the "extended" or "extra" entry

Note that the assembly routine for this, which is cpu-specific, will take the Linux PTE values and derive
appropriate hardware mappings for the hardware PTE entry.

Nomenclature

pgd = page global directory

pud = page upper directory

pmd = page middle directory

pte = page table entry (this is confusion, 'pte' can refer to both the table and an entry in the table)