Chapter 8, Main Memory.

Similar presentations

2 8.1 BackgroundWhen a machine language program executes, it may cause memory address reads or writesFrom the point of view of memory, it is of no interest what the program is doingAll that is of concern is how the program/operating system/machine manage access to the memory

3 Address bindingThe O/S manages an input queue in secondary storage of jobs that have been submitted but not yet scheduledThe long term scheduler takes jobs from the input queue, triggers memory allocation, and puts jobs into physical memoryPCB’s representing the jobs go into the scheduling system’s ready queue

4 The term memory address binding refers to the system for determining how memory references in programs are related to the actual physical memory addresses where the program residesIn short, this aspect of system operation stretches from the contents of high level language programs to the hardware the system is running on

5 1. In high level language programs, memory addresses are symbolic1. In high level language programs, memory addresses are symbolic. Variable names make no reference to an address space, but the values they contain occupy physical memory2. When a high level language program is compiled, typically the compiler generates relative addresses. This means that the numbering of the lines of machine code starts at 0, and the operands of instructions which access program memory do so by line number as an offset from a base address of 0.

6 3. An operating system includes a loader/linker3. An operating system includes a loader/linker. This is part of the long term scheduler functionality. When the program is placed in memory, assuming (as is likely) that it’s base load address is not 0, the relative addresses it contains don’t agree with the physical addresses is occupiesA simple approach to solving this problem is to have the loader/linker convert the relative addresses of a program to absolute addresses at load time. Absolute addresses are the actual physical addresses where the program resides

7 Note the underlying assumptions of this scenario1. Programs can be loaded into arbitrary memory locations2. Once loaded, the locations of programs in memory don’t change

8 There are several different approaches to binding memory access in programs to actual locations1. Binding can be done at compile timeIf it’s known in advance where in memory a program will be loaded, the compiler can generate absolute code

9 2. Binding can be done at load timeThis was the simple approach described earlierThe compiler generates relocatable codeThe loader converts the relative addresses to actual addresses at the time the program is placed into memory.

10 3. Binding can be done at execution timeThis is the most flexible approachRelocatable code (containing relative addresses) is actually loadedAt run time, the system converts each memory reference to a real addressImplementing such a system removes the restriction that a program is always in the same address spaceThis kind of system supports advanced memory management systems like paging and virtual memory, which are the advanced topics of the memory chaptersIn simple terms, you see that this kind of system supports medium term scheduling, where a job can be offloaded and reloaded without needing either to reload it to the same address or go through the address binding process again

11 The following diagram shows the various steps involved in getting a user written piece of high level code into a system and running

13 Logical vs. physical address spaceThe address generated by a program running on the CPU is a logical addressThe address that actually gets manipulated in the memory management unit of the CPU—that ends up in the memory address register—is a physical addressUnder compile time or load time binding, the logical and physical addresses are the same

14 Under execution time binding, the logical and physical addresses differLogical addresses can be called virtual addresses. The book uses the terms interchangeablyOverall, the physical memory belonging to a program can be called its physical address spaceThe complete set of possible memory references of a program can be called its logical or virtual address space

15 For efficiency, memory management in real systems is supported in hardwareThe mapping from logical to physical is done by the memory management unit (MMU)In the simplest of schemes, the MMU contains a relocation registerThis register contains the base address, or offset into main memory, where a program is loadedConverting from a relative address to an absolute address means adding the relative address to the contents of the relocation register

16 When a program is running, every time an instruction makes reference to a memory address, the relative address is passed to the MMUThe MMU is transparent. It does everything necessary to convert the addressFor a simple read, for example, the MMU returns the value found at the converted addressFor a simple write, the MMU takes the given value and writes it to the current addressAll other memory access instructions are handled similarlyAn illustrative diagram of MMU functionality follows

18 Although the simple diagram doesn’t show it, address references can still be out of rangeHowever, the point is that under relative addressing, the program lives in its own virtual worldThe program deals only in logical address while the system handles mapping them to physical addresses

19 The previous discussion illustrated addressing in a very basic wayWhat follows are some historical enhancements, some of which led to the characteristics of complete, modern memory management schemesDynamic loading is a precursor to paging, but it isn’t efficient enough for a modern environmentIt is reminiscent of medium term scheduling

20 Dynamic loadingOne of the assumptions so far has been that a complete program had to loaded into memory in order to runConsider the following scenario1. Separate routines are stored on the disk in relocatable format2. When a routine is called, first it’s necessary to check if it’s already been loaded. If so, control is transferred to it3. If not, the loader immediately loads it and updates its address tables

21 Dynamic linking and shared librariesTo understand dynamic linking, consider what static linking would meanIf every user program that used a system library had to have a copy of the system code bound into it, that would be static linkingThis is clearly inefficient. Why make multiple copies of shared code in loaded program images?

22 Under dynamic linking, a user program contains a special stub where system code is calledAt run time, when the stub is encountered, a system call checks to see whether the needed code has already been loaded by another programIf not, the code is loaded and execution continuesIf the code was already loaded, then execution continues at the address where the system had loaded it

23 Dynamic linking of system libraries supports both transparent library updates and the use of different library versionsIf user code is dynamically linked to system code, if the system code changes, there is no need to recompile the user code. The user code doesn’t contain a copy of the system code

24 If different versions of libraries are needed, this is straightforwardOld user code will use whatever version was in effect when it was writtenNew versions need new names, and new user code can be written to use the new versionHowever, if it is desirable for old user code to use the new library version, the old user code will have to be changed so that the stub refers to the new rather than the old

25 Obviously, the ability to do this is all supported by system functionalityThe fundamental functionality, from the point of view of memory management, is shared access to common memoryIn general, the memory space belonging to one process is disjoint from the memory space belonging to anotherHowever, the system may include access to a shared system library in the virtual memory space of more than one user process

26 OverlaysThis is another technique that is very old and has little modern useIt is possible that it would have some application in environments where physical memory was extremely limited

27 Suppose a program ran sequentially and could be broken into two halves where no loop or if reached from the second half back to the firstSuppose that the system provided a facility so that a running program could load an executable image into its memory spaceThis is reminiscent of forking where the fork() is followed by an exec()

28 If those requirements were met and memory was large enough to hold half of the program but not all of the programWrite the first half and have it conclude by loading the second halfThis is not simple to do, it requires system support, it certainly won’t solve all of your problems, and it would be prone to mistakesHowever, something like this may be necessary if memory is tiny and the system doesn’t support advanced techniques like paging and virtual memory

29 8.2 SwappingKeep this distinct from switching, which refers to switching loaded processes on and off of the CPUSwapping is similar to what a medium term scheduler doesElements of swapping existed in early versions of WindowsSwapping continues to exist in Unix environments

30 Execution images for >1 job may be in memoryIf the long term scheduler picks a job from the input queue and there isn’t enough memory for it, swap out the image of one that had been loaded but is currently inactiveMedium term scheduling does something like this, but on the grounds that the multi-programming level is too high

31 Swapping is implemented because memory space is limitedNote that neither is suitable for interactive type processesSwapping is slow because it writes to a swap space in secondary storageMedium term scheduling or swapping are useful as a protection against limited resourcesHowever, transferring back and forth from the disk is definitely not a time-effective strategy for supporting multi-programming on a modern system

32 8.3 Contiguous Memory AllocationAlong with the other assumptions made so far, such as the fact that all of a program has to be loaded into memory, another assumption is madeIn simple systems, the whole program is loaded, in order, from beginning to end, in one block of physical memory

33 Referring back to earlier chapters, the interrupt vector table is assigned a fixed memory locationO/S code is assigned a fixed locationUser processes are allocated contiguous blocks in the remaining free memoryValid memory address references for relocatable code are determined by a base address and a limit value

34 The base address corresponds to relative address 0The limit tells the amount of memory allocated to the programIn other words, the limit corresponds to the largest valid relative addressThe following diagram illustrates the MMU in more detail under these assumptionsThe limit register contains the maximum relative address value. The relocation register contains the base address allocated to the programKeep in mind that when context switching, these registers are among those that the dispatcher sets

36 Memory allocationsA simple scheme for allocating memory is to give processes fixed size partitionsA slightly more efficient scheme would vary the partition size according to the program sizeThe O/S keeps a table or list of free and allocated memoryPart of scheduling becomes determining whether there is memory enough to load a job

37 Under contiguous allocation, that means finding out whether there is a “hole” (window of free memory) large enough for the jobIf there is a large enough hole, in principle, that makes things “easy” (stay tuned)If there isn’t a large enough hole you have two choices:A. Let the process currently being scheduled wait until there isB. Let the scheduler set that job aside and search for jobs in the input queue that are small enough to fit into available holes

38 The dynamic storage allocation problemThis is a classic problem of memory managementIt is the problem that results when there is enough contiguous memory to allow a process to be loadedThe question is how to choose

39 Historically, three algorithms have been considered1. First fit: Put a process into the first hole found that’s big enough for it. This is fast and allocates memory efficiently2. Best fit: Look for the hole closest in size to what’s needed. This is not as fast and it’s not clearly better in allocation

40 3. Worst fit: This essentially means, load the job into the largest available hole. In practice it performs as well as its name, but see the next bulletExternal fragmentation describes the situation when memory has been allocated to processes leaving lots of unusable small holes of wasted spaceEven though it doesn’t work, the idea behind worst fit was to leave usable size holes

41 Empirical studies have shown that for an amount of allocated memory measured as N, an amount of memory approximately equal to .5N will be lost due to fragmentationThis is known as the 50% ruleIn other words, about 1/3 of memory is wasted

42 In reality, memory is typically allocated in fixed size blocks rather than exact byte counts corresponding to process sizeThe overhead of keeping track of arbitrary amounts of memory measuring in the scores of bits is not practicalA block may consist of 1KB or some other measure of similar magnitude or larger

43 Under this scheme, a process is allocated enough blocks to contain the whole programExternal fragmentation still results, but the smallest hole will be one blockInternal fragmentation also results. This refers to the wasted memory in the last block allocated to a processInternal fragmentation on average is equal to ½ of the block size

44 Picking a block size is a classic case of balancing extremesIf block size is large enough, each process will only need one block. This degenerates into fixed partitions for processes, with large waste due to internal fragmentationIf block size is small enough, you approach allocating byte by byte—internal fragmentation is insignificant, but external fragments can become small enough to be unusable

45 Compacting memory in order to reduce fragmentationIf programs use absolute memory addresses, they simply can’t be relocated. Memory couldn’t be compacted without recompiling the programs. This is out of the questionIf programs use relative memory addresses, they are relocatable. Even during run time, they can be moved to new memory locations, squeezing the unusable fragments out of the memory allocations

46 8.4 Paging Paging deals with two problems:1. It is a way to allocate memory in non-contiguous blocks, which addresses the problem of external fragmentation2. It also deals with fragmentation in the swap space in secondary storage, where re-organization time would be so slow that compaction is not practical

47 Paging is based on the idea that the O/S can maintain data structures that match given blocks in physical memory with given ranges of virtual addressesPhysical memory is conceptually broken into fixed size framesLogical memory is broken into pages of the same sizeIn essence, the O/S maintains a lookup table telling which logical page matches with which physical frame

48 In contiguous memory allocation there was a limit register and a relocation registerIn paging there are special registers for placing the logical address and forming the physical addressIn paging, fixed page sizes mean that the limits are always the same, but there is a table containing the relocation values telling which frame each page address is relocated to

49 Every (logical) address generated by the CPU takes this form:Page part (p) | offset part (d)More specifically, let an address consist of m bitsThen a logical address can be pictured as shown on the next overhead

51 The addresses are binary numbersAs a result, the components of the address fit neatly togetherThe (m – n) digits for p can be treated separately as a page number in the range from 0 to 2(m – n) – 1The n digits for d can be treated separately as an offset in the range from 0 to 2n – 1The m digits altogether give a single address in the range from 0 to 2m – 1In short, the address space consists of 2m pagesThe size of a page is 2n bytes

52 Paging is based on maintaining a page tableFor some value p, the corresponding f value is looked up in the page table at offset p in the tableThe offset d, is unchangedThe physical address is formed by appending the binary value for d to the binary value for fThe result is f | dThe forming of a physical address from a logical address, p | d, using a page table, is illustrated in the following diagram

54 In theory you could have a global page table containing entries for all processesIn practice, each process may have its own page table which is used when that process is scheduledThe use of the page table can be illustrated with a simple example with a single processEach page table entry is like a base and offset for a given page in the process

56 Note again that under paging there is no external fragmentationEvery empty physical memory space is a usable frameInternal fragmentation will average one half of a frame per process

57 In modern systems page sizes vary in the range of around 512 bytes to 16MBThe smaller the page size, the smaller the internal fragmentationHowever, if the memory space is large, there is overhead in allocating small pages and maintaining a page table with lots of entriesAs hardware resources have become less costly, larger memory spaces have become available, and page sizes have grownPage sizes of 2K-8K may be considered representative of an average, modern system

58 Summary of paging ideas1. The logical view of the address space is separate from the physical view. This means that code is relocatable, not absolute2. The logical view is of contiguous memory. Paging is completely hidden by the MMU. Allocation of frames is not contiguous

59 3. Although the discussion has been in terms of the page table, in reality there is also a global frame table. The frame table provides the system with ready look-up of which frames have been allocated, and which are free and still available for allocation4. There is a page table for each process. It keeps track of memory allocation from the process point of view and supports the translation from logical to physical addresses

60 Hardware support for pagingA page table has to hold the mapping from logical pages to physical frames for a single processNote that the page table resides in memoryThe minimum hardware support for paging is a dedicated register on the chip which holds the address of the page table of the currently running processWith this minimal support, for each logical memory address generated by a program, two accesses to actual memory would be necessaryThe first access would be to the page table, the second to the physical address located there

61 In order to be viable, paging needs additional hardware supportIn order to be viable, paging needs additional hardware support. There are two basic choices1. Have a complete set of dedicated registers for the page table. This is fast, but the hardware cost (monetary and real estate on the chip) becomes impractical if the memory space is large

62 2. The chip will contain hardware elements known as translation look-aside buffers (TLB’s). This is the current state of the art, and it will be explained belowTranslation look-aside buffers are in essence a special set of registers which support look-up. In other words, they are table-like. They are designed to contain keys, p, page identifiers, and values, f, the matching frame identifiers

63 TLB’s have an additional, special characteristicTLB’s have an additional, special characteristic. They are not independent buffers. They come as a collectionThe “look-aside” part of the name is meant to suggest that when a search value is “dropped” onto the TLB, for all practical purposes, all of the buffers are searched for that value simultaneously. If the search value is present, the matching value is found within a fixed number of clock cyclesIn other words, look-up in a TLB does not involve linear search or any other software search algorithm. There is no order of complexity to searching depending on the number of entries in the collection of TLB’s. Response time is fixed and small

64 TLB’s are like a highly specialized cacheThe set of TLB’s wouldn’t be big enough to store a whole page tableWhen a process starts accessing pages, this requires reading the page table and finding the frameOnce a page has been read the first time, it’s entered into the TLBSubsequent reads to that page will not require reading from the page table in memory

65 Just like with caching, some process memory accesses will be a TLB “hit” and some will be a TLB “miss”A hit is very economicalA miss requires reading the page table again and replacing (the LRU) entry in the TLB with the most recent page accessedMemory management with TLB’s is shown in the following diagrams

68 Note the following things about the diagramThe page table is complete, so a search of the page table simply means jumping to offset p in the tableThe TLB is a subset, so it has to have both key, p, and look-up, f values in itIt shows addressing, but it doesn’t attempt to show, through arrows or other notation, the replacement of TLB entries on a miss

69 Paging costs can be summarized in this wayOn a hit: TLB access + memory accessOn a miss: TLB access + memory access to page table + memory access to desired pageThe book states that typical TLB’s are in the range from 16 to 512 entriesWith this number of TLB’s, a hit ratio of 80%-98% can be achieved

70 Given a hit ratio and some sample values for the time needed for TLB and memory access, weighted averages for the cost of paging can be calculatedFor example, let the time needed for a TLB search be 20 ns.Let the time needed for a main memory access be 100 ns.

72 In other words, if you could always access memory directly, it would take 100 ns.With paging, it takes on average 140 ns.Paging imposes a 40% overhead on memory accessOn the other hand, without TLB’s, every memory access would cost 100 ns ns., which would mean a 100% overhead on memory access

73 Why would you live with a 40% overhead cost on memory accesses?Remember the reasons for introducing the idea of paging:It allows for non-contiguous memory allocationThis solves the problem of external fragmentation in memoryAs long as the page size strikes a balance between large and small, internal fragmentation is not greatThere is also a potential benefit in reducing fragmentation in swap space—but supporting contiguous memory allocation is the main event

74 The previous discussion has referred to a page table as belonging to one processThis would mean there would be many page tablesWhen a new process was scheduled, the TLB would be flushed so that pages belonging to the new process would be loaded.

75 The alternative is to have a single, unified page tableThis means that each page table entry, in addition to a value for f, would have to identify which process it belonged toThe identifier is known as an ASID, an address space id

76 Such a table would work like this:When a process generated a page id, the TLB would be searched for that pageIf found, it would further be checked to see if the page belonged to the processIf so, everything is goodIf not, this is simply a page missReplacement would occur using the usual algorithm for replacement on a missWith a page table like this, there is no need for flushing when a new process is scheduledIn effect, the TLB is flushed entry by entry as misses occur

77 Implementing protection in the page tableRecall that a page table functions like a set of base and limit registersEach page address is a base, and the fixed page size functions as a limitIf a system maintains page tables of length n, then the maximum amount of memory that could theoretically be allocated to a process is n pages, or n * (page length) bytes

78 In practice, processes do not always need the maximum amount of memory and will not be allocated that muchThis information can be maintained in the page table by the inclusion of a valid/invalid bitIf a page table entry is marked “i”, this means that if a process generates that logical page, it is trying to access an address outside of the memory space that was allocated to itA diagram of the page table follows

80 An alternative to valid/invalid bits is a page table length register (PTLR)The idea is simple—this register is like a limit register for the page tableThe range of logical addresses for a given process begins at page 0 and goes to some maximum which is less than the absolute maximum size allowed for a page tableWhen a process generates a page, it is checked against the PTLR to see if it’s valid

81 The valid/invalid bit scheme can be extended to support finer protectionsFor example, read/write/execute protections can be represented by three bitsYou typically think of these protections as being related to a file systemIn theory, different pages of a process could have different attributesThis may be especially important (and likely considerably more complicated in practice) if you are dealing with shared memory accessible to >1 process

82 8.5 Structure of the Page TableModern systems may support address spaces in the range of 232 to 264 bytes232 is 4 Gigabytes264 ~= x 1018In any case, the higher value is what you get if you allow all 64 bits of a 64 bit architecture to be used as an addressNote that this is 16 x 260, but by this stage the powers of 2 and the powers of 10 do not match up the way they do where we casually equate 210 to 103

84 The reality is that modern systems support logical address spaces too large for simple page tablesIn order to support these address spaces, hierarchical or multi-level paging is usedTake the lower of the address spaces given above, 232Let the page size be 212 or 4 KB

85 232 bytes of memory divided into pages of size 212 bytes means a total of 220 pagesThe corresponding physical address space would consist of 220 framesThat means that each page table entry would have to be at least 20 bits long, in order to hold the frame idSuppose each page table entry is 4 bytes, or 32 bits, longThis would allow for validity and protection bits in addition to the frame idIt’s also simpler to argue using powers of 2 rather than speaking in terms of a table entry of length 3 bytes

86 A page table with 220 entries each of size 22 bytes means the page table is of length 222, or 4 MBBut a page itself under this scenario was only 212, or 4 KBIn other words, it would take 1 K of pages to hold the complete page table for a process that had been allocated the theoretical maximum amount of memory possible

87 To restate the result in another way, the page table won’t fit into a single pageIn theory, it might be possible to devise a hybrid system where the memory for page tables was allocated and addressed by the O/S as a monolithic block, while this was used to support paging of user memoryThis would be a mess and leads to questions like, could there be fragmentation in the monolithic page table block?

88 The practical solution to the problem is hierarchical or multi-level pagingIn one of its forms, it’s similar to indexingThe book refers to this as a forward-mapped page tableGiven a logical page value, you don’t look up the frame id directlyYou look up another page that contains a page id for the page containing the desired frame idThe book mentions that this kind of scheme was used by the Pentium II

89 The scheme is illustrated in the following diagramsA logical address of 32 bits can be divided into blocks of 10, 10, and 12 bits= 20 bits correspond to the page identifierThe remaining 12 bits correspond to d, the offset into a page of size 212 bytes

93 Calculating the cost of paging using a multi-level page tableIn preview, the cost of a miss will be about twice as high because there are two hits to the page tableAs before, let the time needed for a TLB search be 20 ns.Let the time needed for a main memory access be 100 ns.

94 Cost of TLB hit: = 120Cost of TLB miss: = 320The first 100 is the outer page table, the second 100 is the inner page table, the third 100 is the access to the desired addressLet the hit ratio be 98%Then the overall, weighted cost of paging is: .98(120) + .02(320) = 124The overhead cost of paging under this scheme is 24%

95 Observe what happens if you go to a 64 bit address space and a page size of 4KBSample address breakdowns are shown on the next overhead for two and three level pagingThe thing to notice is that the number of bits is so high that you again have the problem that a level of the page table won’t fit into a single page

97 With an address space of this size, six levels would be neededDepending on page size, some 32 bit systems go to 3 or 4 levelsFor 64 bit address spaces, the multi-level paging is too deepThink of the cost of a miss in the weighted average for addressing

98 Hashed page tables--HashingHashed page tables provide an alternative approach to multi-level paging in a large address spaceThe first thing you need to keep in mind is what hashing is, how it works, and what it accomplishesLet y = f(x) be a hashing function

99 You may have a widely dispersed set of n different x values in the domainYou have a specific, compact set of y values that you want to map to in the range.In the ideal case, there would be a set of exactly n different, contiguous y valuesf() is devised so that the likelihood that any two x values will give the same y value is smallIn the ideal case, no two x values would ever collide

100 f() also has to be quick and easy to computeIn practice the range will be somewhat larger than n and collisions may occurThe most common kind of hashing function is based on division and remaindersChoose z to be the smallest prime number larger than nThen let f(x) = z % xf(x) will fall into the range [0, z – 1]

101 Hashing makes it possible to create a look-up table that doesn’t require an index or any sorting or searchingLet there be z – 1 entries in the tableStore the entry for x at the offset f(x) in the tableWhen x occurs again and you want to look up the corresponding value in the table, compute f(x) and read the entry at that offsetNote that the value, x, is repeated in the table entryThis is necessary in order to resolve collisionsThis is illustrated in the following diagram

103 Hashed page tables—Why?Consider again the background of multi-level paging and its disadvantagesConceivably you could be maintaining a global page table or a page table for each processSince memory is being accessed page by page, it’s desirable for a large page table itself to be accessible by pageAs the address space grows large, it becomes impossible to store a complete page table in one page

104 A multi-level page table provides a tree-like way of using pages to access memory addressesThe important thing to note is that each level in the tree corresponds to a block of bits in an addressThe larger the address space, the more levels in the tree, the more memory accesses to arrive at the desired address

105 The important thing to note is this: This structure provides a way of accessing the whole address spaceNow consider this: It is possible to have a 64 bit architecture machine, for example, without having 264 bytes of installed memoryEven if you have maximum memory installed, it would not be in order to accommodate a single process that required that much memoryThe purpose would be to support multi-tasking, with each process getting a portion of the memory

106 Now note this: Even if a process got only a part of memory, the frames allocated to it could be dispersed across the whole address spaceIn other words, a single process might use the address space very sparsely, and there is no way to confine it to a fixed subset of frames

107 Now, for the sake of argument, assume that the page size of a system is large enough that a page table that can be contained in one page would be the maximum amount of memory that could be allocated to one processThe system would still have to maintain a global record of all process/page/frame assignmentsHowever, hashing makes it possible to store the mapping for a single process in one page

108 In summary, making a hashed page table involves the following:When a virtual page is allocated a frame, the virtual page id, p, is hashed to a location in the hash tableThe hash table entry contains p, to account for collisions, and the id of the allocated frameSee the following diagram

110 In this illustration, a collision is shownCollisions are handled with links rather than overflowThe two logical pages, q and p, hash to the same locationTheir corresponding frames are s and r, respectivelyThe book doesn’t give any details on the organization of a hash table on a pageIn general, if you’re doing division/remainder hashing, you might expect that the divisor is chosen so that the size of a hash table node times the number of possible hash values is less than the size of a whole page

111 Clustered page tablesThe book doesn’t give a very detailed explanation of thisThe general idea appears to be that memory can be allocated so that these properties hold:Several different (say 16) page id’s, p, will hash to the same entry in the page tableThis entry will then have no fewer than 16 linked nodes, one for each page, (and possibly more, due to collisions)Honestly, it’s not clear to me what advantage this givesThe length of the page table would be reduced by a factor of 16, but it seems that its width would be increased by a factor of 16I have no more to say about this, and there will be no test questions on it

112 Inverted page tablesInverted page tables are an important alternative to multi-level page tables and hashed page tablesRecall that with (non-inverted) page tables:1. The system has to maintain a global frame table that tells which frames are allocated to which processes

113 1. The system has to maintain a page table for each process, that makes it possible to look up the physical frame that is allocated to a given logical addressSimple illustrations of both of these things are given on the next overhead

115 An inverted page table is an extension of the frame tableInstead of many page tables, one for each process, there is one master tableThe offsets into the table represent the frame id’s for the whole physical memory spaceThe table has two columns, one for pid, and one for a logical page id, p, belong to the process

117 The use of an inverted page table to resolve a logical address is shown in the diagram on the next overheadThe key thing to notice about the process is that it is necessary to do linear search through the inverted page table, looking for a match on the pid that generated the address and the logical address that was generatedThe offset into the table identifies that frame that was allocated to it

119 Searching the inverted page table is the cost of this approachThere is no choice except for simple, linear search because the random allocation of frames means that the table entries are not in any orderIt is not possible to do binary search or anything else

120 This is where hashing and inverted page tables come togetherThe way to get direct access to a set of values in random order is to hashLet n be the total number of pages/frames and devise a hashing function that will provide this mapping:f(pid, p)  [0, n – 1]Use this function to allocate frames to processes

121 Then when the logical address (pid, p) is generated, hash itIn theory, the hash function value itself could be the frame id, f, but you still have to do table look-up because of the possibility of collisionsYou can go directly to offset f in the table and check there for the key values (pid, p). You don’t have to do linear searchIf not found, check for overflow or linking until you find the desired values(Note that if you don’t find the desired values, the process has tried to access an address that is out of range.)

122 The most recent discussions have left TLB’s behind, but they are still relevant as hardware support for addressingA diagram of the use of a hashed inverted page table with TLB’s is shown on the next overheadIn looking at the picture, remember that since the table is stored in memory, that adds an extra memory access to the overall cost of addressingAlso note that in reality the table would probably be bigger than a pageThe table would be stored in system space and might be addressed using a special scheme

124 The previous discussion included the assumption that you could allocate frames based on hashingThis simplified things and made the diagram easier to drawIn reality, you would have a frame table that recorded which frame was allocated to which frameYou would then have a separate hash table that supported look-up into the frame table

127 Shared pagesThe basic idea is this: Shared memory between processes can be implemented by mapping their logical addresses to the same physical pages (frames)An operating system may support IPC this wayIt is also a convenient way to share (read only) dataIt’s also possible to share code, such as libraries which >1 process need to run

128 In order for code to be shareable, it has to be reentrantReentrant means that there is nothing in the code which causes it to modify itselfConsider the MISC sumtenV1.txt exampleIt is divided into a data segment and a code segmentTwo processes code share the code as long as the accesses to memory variables were mapped to separate copies of the variables

129 Every memory access that a program makes has to pass through the O/SThis means that the O/S is responsible for incorrect memory access and for detecting when shared code may be being misusedThreads are a good, concrete example of shared codeWe have considered some of the problems that can occur when threads share references to common objectsIf they share no references, then they are completely trouble free

130 Keep in mind that an inverted page table is a global structure that effectively maps one logical page to one physical frameThis kind of arrangement makes it difficult to support memory pages (frames) shared between different processesTo support shared memory, it would be necessary to add linking to the table or add other data structures to the system

131 8.6 SegmentationThe idea behind segmentation is that the user view of memory is not simply a linear array of bytesUsers tend to think of their applications in terms of program unitsThe relative locations of different modules or classes are not importantEach separate unit can be identified by its offset from some base and its length, where the length of each is variable

132 Segmentation supports the user view of memoryAn address is conceptually of the form <segment id, offset into segment>And address isn’t simply a pure logical address or a page plus offset

133 Implementation of segmentationThe system would have to support segmented addresses in softwareIt would then be necessary to map from segmented addresses to physical addresses

134 Segments may be reminiscent of simple contiguous memory allocationThey may also be thought of, very roughly, as (comparatively large) pages of varying sizeJust like with paging, hardware support in the MMU makes the translation possibleThe diagram on the next overhead shows how segmented addresses are resolved

136 This is similar to one of the earliest diagrams showing in general how page addresses were resolvedThe segment table is like a set of base-limit pairs, one for each segmentJust like with pages, in the long run you would probably want some sort of TLB supportFor the time being, segments and pages are treated separately, in real, modern systems with segmentation, the segments are subdivided into pages which are accessed through a paging mechanism

137 Protection and sharing with segmentationThe theory is that protection and sharing make more logical sense under a segmented schemeInstead of worrying about protection and sharing at a page level, the assumption is that the same protection and sharing decisions would logically apply to a complete segment

138 In other words, protection is applied to semantic constructs like “data block” or “program block”Under a segmented scheme, semantically different blocks would be stored in different segmentsSimilarly with sharingIf two processes need to share the same block, that the block be stored in a given segment, and give both processes accesses to the segment

139 Although perhaps clearer than paged sharing, segmented sharing doesn’t solve all of the problems of sharingIf code is shared and two processes access it, the system still has to resolve addresses when processes cross the boundary from unshared to shared codeIn other words, two processes may know the same code by different symbolic names; potentially, ifs or jumps across boundaries have to be supported (from one address space to another) and the return from shared code has to go to the address space of whichever process called it

140 Fragmentation, in the sense that it’s like contiguous memory allocation, suffers from the problem of external fragmentationThe difference is that a single process consists of multiple segments and each segment is loaded into contiguous memoryThe ultimate solution to this problem is to break the segments into pages

141 8.7 Example: The Intel PentiumThe reality is that the Intel 8086 architecture has had segmented addressing from the beginning. (The Motorola didn’t.)The following details are given in the same spirit that the information about scheduling and priorities was given in the chapter on schedulingNamely, to show that real systems tend to have many disparate features, and overall they can be somewhat complex

142 Some information about Intel addressingThe maximum number of segments per process is 16K (214)Each segment can be as large as 4GB (232)A page is 4KB (212), so a segment may consist of up to 220 or 1M of pages

143 The logical address space of a process is divided into two partitions, each of up to 8K segmentsPartition 1 is private to the process. Information about its segments are stored in the local descriptor tablePartition 2 contains segments shared among processes. Information about these segments is stored in the global descriptor table

144 The first part of a logical address is known as a selectorIt consists of these parts:13 bits for segment id, s1 bit for global vs local, g2 bits for protections(14 bits total for segment id)

145 Within each segment, an address is pagedIt takes two levels to hold the page tableThe page address takes the form described earlier:10 bits for outer page of page table10 bits for inner page of page table12 bits for offset(At 4 bytes per page table entry, you can fit 210 entries into a 4KB page)

146 Notice that you’ve got both 14 bits for segment id and 32 bits for segment idThis means that in a 32 bit architecture you can’t “use” all of the bitsThere is a limit on how many segments total you can have, but there is flexibility in where they’re located in memoryTake a look at the following diagram and weep