The oft-misunderstood /3GB switch

The /3GB switch changes the way the 4GB virtual address space is split up. Instead of splitting it as 2GB of user mode virtual address space and 2GB of kernel mode virtual address space, the split is 3GB of user mode virtual address space and 1GB of kernel mode virtual address space.

That's all.

And yet people think it does more than that.

I think the problem is that people think that "virtual address space" means something other than just "virtual address space".

The term "address space" refers to how a numerical value (known as an "address") is interpreted when it is used to access some type of resource. There is a physical address space; each address in the physical address space refers to a byte in a memory chip somewhere. (Note for pedants: Yes, it's actually spread out over several memory chips, but that's not important here.) There is an I/O address space; each address in the I/O address space allows the CPU to communicate with a hardware device.

And then there is the virtual address space. When people say "address space", they usually mean "virtual address space".

The virtual address space is the set of possible pointer values (addresses) that can be used at a single moment by the processor. In other words, if you have an address like 0x12345678, the virtual address space determines what you get if you try to access that memory. The contents of the virtual address space changes over time, for example, as you allocate and free memory. It also varies based on context: each process has its own virtual address space.

Saying that 2GB (or 3GB) of virtual address space is available to user mode means that at any given moment in time, out of the 4 billion virtual addresses available in a 32-bit value, 2 billion (or 3 billion) of them are potentially usable by user-mode code.

Over the next few entries, I'll talk about the various consequences and misinterpretations of the /3GB switch.

The distinction is still necessary. Every process has its own 4GB virtual address space, but all share the same RAM. Two processes can have two different things at the same virtual address, which of course must map to two different RAM addresses.

For example, almost every EXE gets its image loaded at the same address, 0x400000 IIRC, in its own address space. So there’s no way for virtual addresses to directly correspond to RAM addresses.

Just for comparison, Linux defaults to 3 GB for user mode and 1 GB for kernel mode. There are experimental patches to implement a separate virtual address space for the kernel, thus granting 4 GB for user mode AND 4 GB for kernel mode. Unfortunately, there is a 10-20% performance hit because every system call requires an address space switch (and TLB cache flush).

Perhaps a more useful way to think about it is this. In order to bend the spoon first you must know that there is no spoon.

Likewise. There is no RAM. When you call new, or alloc() or the allocator of your choice on a Virtual Memory OS you are not allocating RAM. You are allocating space in a paging file on disk. Your "memory" allocation is actually a set of file system calls.

The virtual address space is nothing but a view of files. RAM in this sense is nothing more than a cache, to speed up all that disk access your program does everytime it changes the value in a "memory location".

Remember that ‘virtual’ means ‘fake’. Windows – and any other VMOS – fakes an address space for a user process, conning the process (OK, the user-mode programmer) into believing that they have a huge flat area of memory that they can randomly address at any time.

In reality, it’s full of holes where the programmer hasn’t indicated a desire to use a particular address yet. Of the addresses that the programmer has asked to use, the OS may or may not have currently allocated physical memory to that address. If it hasn’t, the OS will allocate physical memory when it’s accessed, seamlessly loading the appropriate data if required, suspending (blocking) the thread until the data arrives. If the data isn’t available, you’ll get some kind of exception. All versions of Windows, including CE, are demand-paged – you only get memory when it’s required.

Virtual addresses can be backed by memory-mapped files or by the swap file.

If you have 4GB of RAM – or even more – the system will continue to try to optimise the memory allocation between the working sets of all running processes, the system working set, and the system caches. If you’re not actively using the RAM for something, it will get used as data caches.

An understanding of how virtual memory really works is essential to understanding how to improve your software’s performance. With the overhead of paging being as high as it is, you want to keep your core working set as close together as possible, and get your data access strategies right. A true B-tree or linked list may be algorithmically better than a B+tree or an array, in pure processor time, but an actual implementation may display the opposite characteristics due to lower paging and better processor cache use.

I recommend ‘Inside Windows 2000, Third Edition’ by Mark Russinovich and David Solomon for further reading. A new edition covering XP and 2003 is due out later in the year ( http://www.solsem.com/books.html ).

OK, Igor/Chris explained it better, although you can often see that new and malloc have no effect on the reported VM size. This is because these operators and functions, which end up as calls to HeapAlloc, etc, run on top of the virtual memory system, allocating smaller blocks from whole pages.

This is all nice to the user mode. But what about kernel mode. Either it didn’t need 2gb of space or it now has run out of space.

I knew what it meant. I just don’t understand how the kernal copes with memory mapped cache and AGP apertures, not to mention system.dlls.

In 9x virtual dos machines (I’m nor refering to ntvdm at all) required memory where the cache and AGP goes. Was this required for each V86 process (like 3.1 where all Win was in one V86 and each dos box in it’s own V86).

Or does it only use less than 1 gig of it’s 2 (in which case why isn’t /3gb standard)?

I understand that each user mode process has it’s own address space (so user mode programs can use number_of_processes x user_mode_memory_size mb of memory but am I right in thinking the kernel only ever has 1 or 2 GB available to it (in a 32 bit Windows)?

If so, what are the effects of giving the kernel more or less memory and why do the server vs non-server OSs have different defaults (is it a trade-off between a slightly quicker check for an address being kernel/user mode and the space available to processes)?

Also, doesn’t the /3GB switch have knock on effects on quite a few Microsoft programs (I thought Exchange, SQL server and link.exe all behaved differently when the switch is set)?

Is it really virtual address space if the machine has enough real memory?

Of course it is. Just because you have 4Gb of physical RAM, it doesn’t mean it’s mapped into any process’ virtual address space, just as having 4Gb of virtual address space doesn’t mean any physical RAM is mapped to any portion of that space. You *could* map the entire 4Gb of RAM to the address space of one process in a 1:1 mapping, but there’s no requirement to do so.

Add that to the fact that each process will have its own address space, and only the upper 2Gb (1Gb with /3GB) is shared. Mapping from physical RAM to virtual memory is done as required by the process.

David/Jonathan: Depending on your needs, either 2GB is way more than needed, or not nearly enough. That’s why we decided to make it tunable.

The way memory mapped cache works has more to do with the memory manager and cache manager than the VA space, to be honest. Keeping track of the pages doesn’t equate to having them "live" in kernel mode VA space. That’s a huge topic though, check out Inside Windows 2000 for a good treatment.

Chris: Like I said, 1GB can be nothing if you have lots of I/O that needs to be done. DBA’s often assume that the more RAM they can get, at any cost, the better. That’s only true if you’re not throttling your I/O. What good is getting the data quickly if you can’t send it anywhere because of long queues? I’ve seen many SQL implementations that take all the RAM they can get their hands on, and strangle the OS.

Just flipping /3GB isn’t enough for most programs. It has the effect on the kernel, true, but unless your process’s executable has the Large Address Space Aware flag set, Windows won’t actually give you the full 3GB space. Link your executable with /LARGEADDRESSAWARE or use EDITBIN.

The reason you have to do this is that some programs were written to assume that bit 31 of a user-mode pointer would always be 0 and therefore that this bit could be reused for some other purpose. Some applications do this with bits 1 and 0 because for a 4-byte aligned quantity, bits 1 and 0 are also 0. When it wants to use the pointer, the repurposed bits values are stored somewhere, then they’re set to 0.

Of course, as soon as you start getting pointers in the 2-3GB region, bit 31 is set. The application sets the bit to 0 – whoops, you’re pointing at the wrong place.

So unless you’re working with an application you know is large address aware (Exchange Information Store and SQL Server spring to mind) don’t bother enabling /3GB, it’ll hurt kernel address space but do nothing for your processes.

As for what gets loaded into kernel VAs: memory sections for all loaded kernel mode modules (NTOSKRNL.EXE, HAL.DLL, WIN32K.SYS and a whole load of drivers), paged pool and non-paged pool (used for all the drivers’ and system components’ private memory heap allocations – non-paged pool for memory that may be accessed when paging is disabled, paged pool for everything else), system and process page tables, system and process working set lists, and the system caches.

Reducing the system space to 1GB with /3GB causes the maximum paged pool size to drop, according to "Inside Windows 2000", from 482MB to 160MB, and removes a whole area 448MB in size which is used for system page table entries and system cache. The file system cache in Windows is based on mapped files: the system maps blocks of files 256KB at a time into virtual memory; when a process reads from or writes to a file, the virtual addresses are touched which causes the memory manager to swap data in from the file. This includes filesystem metadata – my system is currently showing 15MB ‘valid’ (currently paged in), 6MB ‘standby/dirty’ (removed from the working set and will be reused at some point, after being written to disk if dirty) and 542 views for a total of about 135MB of VA space for $Mft, the NTFS Master File Table. This is on a 512MB RAM system.

So reducing the amount of space available to the kernel can seriously affect disk caching and the ability of various components to allocate memory – probably reducing the number of asynchronous I/Os that can be pending. For the two components I mentioned before, Exchange Information Store and SQL Server, it can be worthwhile because they do a lot of their own disk buffering and a lot of unbuffered I/O to their transaction logs (e.g. STORE writes the entire content of every message to the transaction log before storing it in the appropriate database).

You can tune things even more on Windows XP and Server 2003 with the /USERVA switch.

The following is a technical update to IBM’s explaination of how virtual memory works. It was written by Jeff Berryman of the University of British Columbia and distributed at a share meeting shortly after IBM announced virtual memory for the 370 series.

RULES:

Each player gets several million "things".

"Things" are kept in "crates" that hold 4096 "things" apiece. "Things" in the same "crate" are called "crate-mates".

"Crates" are stored either in the "workshop" or the "warehouse". The workshop is almost always too small to hold all the crates.

There is only one workshop, but there may be many warehouses. Everybody shares these.

To identify things, each thing has its own "thing number".

What you do with a thing is to "zark" it. Everybody takes turns zarking.

You can only "zark" your things or shared things, not anyone else’s.

Things can only be "zarked" when they are in the workshop.

Only the "Thing King" knows whether a thing is in the workshop or the warehouse.

The longer the things in a crate go without being zarked, the grubbier the crate is said to become.

The way you get things is to ask the "Thing King". He only gives out things in multiples of 4096 (that is, "crates"). This is to keep the royal overhead down.

The way you zark a thing is to give its thing number. If you give the number of a thing that happens to be in the workshop, it gets zarked right away. If it is in a warehouse, the Thing King packs the crate containing your thing into the workshop. If there is no room in the workshop, he first find the grubbiest crate in the workshop (irregardless of whether it is yours or someone else’s) and packs it off (along with its crate-mates) to a warehouse. In its place he puts the crate containing your thing. Your thing then gets zarked, and you never knew that it wasn"t in the workshop all along.

Each player"s stock of things has the same thing numbers (to the players) as everyone else"s. The Thing King always knows who owns what thing, and whose turn it is to zark. Thus, one player can never accidentally zark another player"s things, even though they may have the same thing numbers.

NOTES:

Traditionally, the Thing King sits at a large, segmented table, and is attended by pages (the so-called "table pages") whose job it is to help the Thing King remember where all the things are and to whom they belong.

One consequence of rule #13 is that everyone’s thing numbers will be the similar from game to game, regardless of the number of players.

The Thing King has a few things of his own, some of which get grubbier, just as player’s things do, and so move back and forth between the workshop and the warehouse. Other things are used too often to get grubby, or are just to heavy to move.

With the given set of rules, oft-zarked things tend to get kept mostly in the workshop, while little-zarked things stay mostly out in the warehouse. This is efficient stock control.

Sometimes even the warehouses get full. The Thing King then has to start piling crates upon the dump out back. This makes the game slower because it takes a long time to get thing off the dump when they are needed in the workshop. In this case, the Thing King selects the grubbiest crates he can find in the warehouses and sends them to the dump in his spare time, thus keeping the warehouses from getting too full. This also means that the least-often zarked things end up on the dump, so the Thing King won"t have to get things from the dump so often. This speeds up the game when there are a lot of players and the warehouses are getting full.

Anyone with any experience administering applications like SQL Server and Exchange Server know how much they love memory. They’ll use every bit they can get their hands on. And then some more… The last thing you want to find out…

&nbsp; As Evan&nbsp;already mentioned on his blog, Raymond Chen has a great series on /3GB switch on his blog. What is really cool is that Raymond takes on some myths about the /3GB switch and&nbsp; the fact that he…