The preceding example searches the symbol table of the running kernel for modload, a kernel function we discussed earlier. The command returned severalmatches that contain the modload string, including the desired modload functionsymbol. (For more information on symbol tables and specific information on thecolumns listed, see the nm(1), a.out(4), and elf(3E) manual pages. Also, refer toany number of texts that describe the Executable and Linking Format (ELF) file,which is discussed in more detail in Chapter 4.)In step 5, we indicate that the module install code is invoked indirectly throughthe module’s _init() function. Several functions must be included in any loadable kernel module to facilitate dynamic loading. Device drivers and STREAMSmodules must be coded for dynamic loading. As such, a loadable driver interface isdefined. In general, the required routines and data structures that are documented apply to all loadable kernel modules—not just to drivers and STREAMSmodules (although there are components that are specific to drivers)—and do notapply to objects such as loadable system calls, file systems, or scheduling classes.Within a loadable kernel object, an initialization, information, and finish routine must be coded, as per the definitions in the _init(9E), _info(9E), and_fini(9E) manual pages. A module’s _init() routine is called to complete theprocess of making the module usable after it has been loaded. The module’s_info() and _fini() routines also invoke corresponding kernel module management interfaces, as shown in Table 4-2.Table 4-2 Module Management InterfacesKernelModuleRoutine_init()_info()_fini()

Module installation is abstracted to define a generic set of structures and interfaces within the kernel. Module operations function pointers for installing, removing, and information gathering (the generic interfaces shown in Table 4-2) aremaintained in a mod_ops structure, which is extended to provide a definition foreach type of loadable module. For example, there is a mod_installsys() func-

Kernel Module Loading and Linking

121

tion specific to loading system calls, a mod_installdrv() function specific toloading device drivers, and so forth.For each of these module types, a module linkage structure is defined; it contains a pointer to the operations structure, a pointer to a character string describing the module, and a pointer to a module-type-specific structure. For example, thelinkage structure for loadable system calls, modlsys, contains a pointer to the system entry table, which is the entry point for all system calls. Each loadable kernelmodule is required to declare and initialize the appropriate type-specific linkagestructure, as well as a generic modlinkage structure that provides the genericabstraction for all modules.

modlinkageml_linkage[]modlxxxxxx_modops

mod_xxxops

mod_installdrv()

mod_installxxxmod_removexxxmod_infoxxx

mod_installsys()mod_installfs()mod_installstrmod()mod_installsched()

_init()mod_install()

module-specificfunctions

Within the module facility is a module type-specific routine for installing modules, entered through the MODL_INSTALL macro called from the genericmod_install() code. More precisely, a loadable module’s _init() routine callsmod_install(), which vectors to the appropriate module-specific routine throughthe MODL_INSTALL macro. This procedure is shown in Figure 4.6.

mod_installexec()

A loadable kernel modulemod_install()MODL_INSTALL(&modlinkage)

Macro will resolve to typespecific install function.

kernel module support codeFigure 4.6 Module Operations Function VectoringFigure 4.6 shows the data structures defined in a loadable kernel module: thegeneric modlinkage, through which is referenced a type-specific linkage structure (modlxxx), which in turn links to a type-specific operations structure thatcontains pointers to the type-specific functions for installing, removing, and gathering information about a kernel module. The MODL_INSTALL macro is passed theaddress of the module’s generic linkage structure and from there vectors in to theappropriate function. The module-specific installation steps are summarized inTable 4-3.

122

Kernel Bootstrap and Initialization

Table 4-3 Module Install RoutinesModuleTypeDevice driver

mod_installdrv

System call

mod_installsys

File system

mod_installfs

STREAMS

mod_installstrmod

modulesSchedulingclassExec module

Install Function

mod_installschedmod_installexec

SummaryWrapper for ddi_installdrv().Install the driver entry in the kernel devops table.Install the system call’s sysenttable entry in the kernel sysenttable.Installs the file system Virtual FileSystem (VFS) switch table entry.Install the STREAMS entry in thekernel fmodsw switch table.Install the scheduling class in thekernel sclass array.Install the exec entry in the kernelexecsw switch table.

The summary column in Table 4-3 shows a definite pattern to the module installation functions. In many subsystems, the kernel implements a switch table mechanism to vector to the correct kernel functions for a specific file system, schedulingclass, exec function, etc. The details of each implementation are covered in subsequent areas of the book, as applicable to a particular chapter or heading.As we’ve seen, the dynamic loading of a kernel module is facilitated through twomajor kernel subsystems: the module management code and the kernel runtimelinker. These kernel components make use of other kernel services, such as thekernel memory allocator, kernel locking primitives, and the kernel ksyms driver,taking advantage of the modular design of the system and providing a good example of the layered model discussed earlier.

Part TwoTHE SOLARISMEMORY SYSTEM

• Solaris Memory Architecture• Kernel Memory• Memory Monitoring

123

124

5SOLARIS MEMORYARCHITECTURET

he virtual memory system can be considered the core of a Solaris system, andthe implementation of Solaris virtual memory affects just about every other subsystem in the operating system. In this chapter, we’ll take a look at some of thememory management basics and then step into a more detailed analysis of howSolaris implements virtual memory management. Subsequent chapters in PartTwo discuss kernel memory management and that can be used to monitor andmanage virtual memory.

5.1

Why Have a Virtual Memory System?A virtual memory system offers the following benefits:• It presents a simple memory programming model to applications so thatapplication developers need not know how the underlying memory hardwareis arranged.• It allows processes to see linear ranges of bytes in their address space,regardless of the physical layout or fragmentation of the real memory.• It gives us a programming model with a larger memory size than availablephysical storage (e.g., RAM) and enables us to use slower but larger secondary storage (e.g., disk) as a backing store to hold the pieces of memory thatdon’t fit in physical memory.

125

126

Solaris Memory Architecture

A virtual view of memory storage, known as an address space, is presented to theapplication while the VM system transparently manages the virtual storagebetween RAM and secondary storage. Because RAM is significantly faster thandisk, (100 ns versus 10 ms, or approximately 100,000 times faster), the job of theVM system is to keep the most frequently referenced portions of memory in thefaster primary storage. In the event of a RAM shortage, the VM system is requiredto free RAM by transferring infrequently used memory out to the backing store. Byso doing, the VM system optimizes performance and removes the need for users tomanage the allocation of their own memory.Multiple users’ processes can share memory within the VM system. In a multiuser environment, multiple processes can be running the same process executable binaries; in older Unix implementations, each process had its own copy of thebinary—a vast waste of memory resources. The Solaris virtual memory systemoptimizes memory use by sharing program binaries and application data amongprocesses, so memory is not wasted when multiple instances of a process are executed. The Solaris kernel extended this concept further when it introduced dynamically linked libraries in SunOS, allowing C libraries to be shared amongprocesses.To properly support multiple users, the VM system implements memory protection. For example, a user’s process must not be able access the memory of anotherprocess, otherwise security could be compromised or a program fault in one program could cause another program (or the entire operating system) to fail. Hardware facilities in the memory management unit perform the memory protectionfunction by preventing a process from accessing memory outside its legal addressspace (except for memory that is explicitly shared between processes).Physical memory (RAM) is divided into fixed-sized pieces called pages. The sizeof a page can vary across different platforms; the common size for a page of memory on an UltraSPARC Solaris system is 8 Kbytes. Each page of physical memoryis associated with a file and offset; the file and offset identify the backing store forthe page. The backing store is the location to which the physical page contents willbe migrated (known as a page-out) should the page need to be taken for anotheruse; it’s also the location the file will be read back in from if it’s migrated in(known as a page-in). Pages used for regular process heap and stack, known asanonymous memory, have the swap file as their backing store. A page can also be acache of a page-sized piece of a regular file. In that case, the backing store is simply the file it’s caching—this is how Solaris uses the memory system to cache files.If the virtual memory system needs to take a dirty page (a page that has had itscontents modified), its contents are migrated to the backing store. Anonymousmemory is paged out to the swap device when the page is freed. If a file page needsto be freed and the page size piece of the file hasn’t been modified, then the pagecan simply be freed; if the piece has been modified, then it is first written back outto the file (the backing store in this case), then freed.Rather than managing every byte of memory, we use page-sized pieces of memory to minimize the amount of work the virtual memory system has to do to main-

Why Have a Virtual Memory System?

127

tain virtual to physical memory mappings. Figure 5.1 shows how the managementand translation of the virtual view of memory (the address space) to physical memory is performed by hardware, known as the virtual memory management unit(MMU).

MMUV

P

ProcessScratchMemory(Heap)

0000

ProcessBinary

Process’sLinear VirtualAddress Space

VirtualMemorySegments

Page sizePieces ofVirtualMemory

Virtual-toPhysicalPhysicalTranslation MemoryPagesTables

PhysicalMemory

Figure 5.1 Solaris Virtual-to-Physical Memory ManagementThe Solaris kernel breaks up the linear virtual address space into segments, onefor each type of memory area in the address space. For example, a simple processhas a memory segment for the process binary and one for the scratch memory(known as heap space). Each segment manages the mapping for the virtualaddress range mapped by that segment and converts that mapping into MMUpages. The hardware MMU maps those pages into physical memory by using aplatform-specific set of translation tables. Each entry in the table has the physicaladdress of the page of memory in RAM, so that memory accesses can be convertedon-the-fly in hardware. We cover more on how the MMU works later in the chapter when we discuss the platform-specific implementations of memory management.Recall that we can have more virtual address space than physical address spacebecause the operating system can overflow memory onto a slower medium, like adisk. The slower medium in Unix is known as swap space. Two basic types of memory management manage the allocation and migration of physical pages of memory to and from swap space: swapping and demand paging.

128

Solaris Memory Architecture

The swapping algorithm for memory management uses a user process as thegranularity for managing memory. If there is a shortage of memory, then all of thepages of memory of the least active process are swapped out to the swap device,freeing memory for other processes. This method is easy to implement, but performance suffers badly during a memory shortage because a process cannot resumeexecution until all of its pages have been brought back from secondary storage.The demand-paged model uses a page as the granularity for memory management. Rather than swapping out a whole process, the memory system just swapsout small, least used chunks, allowing processes to continue while an inactive partof the process is swapped out.The Solaris kernel uses a combined demand-paged and swapping model.Demand paging is used under normal circumstances, and swapping is used only asa last resort when the system is desperate for memory. We cover swapping andpaging in more detail in “The Page Scanner” on page 178.The Solaris VM system implements many more functions than just management of application memory. In fact, the Solaris virtual memory system is responsible for managing most objects related to I/O and memory, including the kernel,user applications, shared libraries, and file systems. This strategy differs significantly from other operating systems like earlier versions of System V Unix, wherefile system I/O used a separate buffer cacheOne of the major advantages of using the VM system to manage file systembuffering is that all free memory in the system is used for file buffering, providingsignificant performance improvements for applications that use the file system andremoving the need for tuning the size of the buffer cache. The VM system can allocate all free memory for file system buffers, meaning that on a typical system withfile system I/O, the amount of free memory available is almost zero. This numbercan often be misleading and has resulted in numerous, bogus, memory-leak bugsbeing logged over the years. Don’t worry, “almost zero” is normal. (Note that freememory is no longer always low with Solaris 8.)In summary, a VM system performs these major functions:• It manages virtual-to-physical mapping of memory• It manages the swapping of memory between primary and secondary storageto optimize performance• It handles requirements of shared images between multiple users and processes

5.2

Modular ImplementationEarly SunOS versions (SunOS 3 and earlier) were based on the old BSD-stylememory system, which was not modularlized, and thus it was difficult to move the

Modular Implementation

129

memory system to different platforms. The virtual memory system was completelyredesigned at that time, with the new memory system targeted at SunOS 4.0. Thenew SunOS 4.0 virtual memory system was built with the following goals in mind:• Use of a new object-oriented memory management framework• Support for shared and private memory (copy-on-write)• Page-based virtual memory managementThe VM system that resulted from these design goals provides an open frameworkthat now supports many different memory objects. The most important objects ofthe memory system are segments, vnodes, and pages. For example, all of the following have been implemented as abstractions of the new memory objects:•••••

Physical memory, in chunks called pagesA new virtual file object, known as the vnodeFile systems as hierarchies of vnodesProcess address spaces as segments of mapped vnodesKernel address space as segments of mapped vnodes

• Mapped hardware devices, such as frame buffers, as segments of hardware-mapped pagesThe Solaris virtual memory system we use today is implemented according to theframework of the SunOS 4.0 rewrite. It has been significantly enhanced to providescalable performance on multiprocessor platforms and has been ported to manyplatforms. Figure 5.2 shows the layers of the Solaris virtual memory implementation.Physical memory management is done by the hardware MMU and a hardware-specific address translation layer known as the Hardware Address Translation (HAT) layer. Each memory management type has its own specific HATimplementation. Thus, we can separate the common machine-independent memory management layers from the hardware-specific components to minimize theamount of platform-specific code that must be written for each new platform.The next layer is the address space management layer. Address spaces are mappings of segments, which are created with segment device drivers. Each segmentdriver manages the mapping of a linear virtual address space into memory pagesfor different device types (for example, a device such as a graphics frame buffercan be mapped into an address space). The segment layers manage virtual memory as an abstraction of a file. The segment drivers call into the HAT layer to create the translations between the address space they are managing and theunderlying physical pages.

130

Solaris Memory Architecture

Global Page Replacement Manager — Page Scanner

Address Space Management

segkmem

segmap

segvn

Kernel MemorySegment

File Cache MemorySegment

Process MemorySegment

Hardware Address Translation (HAT) Layersun4cHAT layer

sun4mHAT layer

sun4dHAT layer

sun4uHAT layer

x86HAT layer

sun4csun4-mmu

sun4msr-mmu

sun4dsr-mmu

sun4usf-mmu

x86i386 mmu

32/32-bit4K pages

32/36-bit4K pages

32/36-bit4K pages

64/64-bit8K/4M pages

32/36-bit4K pages

Figure 5.2 Solaris Virtual Memory Layers

5.3

Virtual Address SpacesThe virtual address space of a process is the range of memory addresses that arepresented to the process as its environment; some addresses are mapped to physical memory, some are not. A process’s virtual address space skeleton is created bythe kernel at the time the fork() system call creates the process. (See “ProcessCreation” on page 293.) The virtual address layout within a process is set up bythe dynamic linker and sometimes varies across different hardware platforms. Aswe saw in Figure 5.1 on page 127, virtual address spaces are assembled from aseries of memory segments. Each process has at least four segments: