4GB Memory Limitations?

Prior to SP2, XP was able to use physical address space above the 4GB boundary in PAE mode. With SP2, MS removed that ability because certain companies don't know how to write drivers which can sit in physical address space above the 4GB boundary (nVidia).

PCI devices can carve out space from the 4GB ceiling downward for their own use. Pop a 512MB video card into a PC, and it typically chews up 512MB of RAM in the 4GB-and-lower range. Those addresses are smack dab on top of any RAM that may be there. Someone has to move, so the memory controller alters the memory map to show a "hole" in the RAM address range.

It would be like counting a row of ten apples (RAM), but some bozo decides to stick two oranges (Video card RAM) betwen the 8th and 9th apple. Now, if you're a baker making a pie, and you need 10 apples, do you use eight apples and two oranges, or do you recognize that the two apples are in positions 11 and 12, and skip from 8 to 11 when counting out apples?

The memory controller knows someone stuck a bunch of oranges ahead of the final few apples, so it says, "If you come to me for an Apple where an Orange is, I'm going to send you to where the last fiew Apples are." The memory controller just mapped your request for Item 9 to Apple 11. If it hadn't, you would've taken Orange 1, and who would want an apple pie with an orange in it? One, two, three, four, five, six, seven, eight, eleven, twelve. You still have ten apples, but they're now numbered 1-8 and 11-12.

If your baker can count over 10 but knows he can't have more than 10 items total in use at any given time (XP RTM, W2K), then he can access the 11th and 12th apples, but still has access to only 10 total apples.

Well, strictly speaking it's no longer a 32 bit OS, because by using PAE it has 36 bits of address space. 2**36 = 64GB

No, it's still a 32-bit OS, because pointers are still 32 bits. And physical addresses in NT have been 64-bit for as long as I have DDKs for, even on non-PAE platforms. There are no 36-bit addresses in anything other than the page tables, and even the addresses in the page tables are not made up of 36 bits (you don't need to encode the bottom twelve bits of an address in the PTE because the pages are at least 4 kiB).

quote:

So AWE enables applications to be written to make use of a larger address space

No, AWE allows applications to carve out a chunk of non-pagable physical memory which they can then map portions of into their normal (4 GiB split 2/2, 3/1, or 4/0, depending on platform, for 32-bit apps, 2^64 bytes for 64-bit apps) virtual address space. AWE is orthogonal to PAE; you don't need a PAE platform to use AWE, and it's also bitness neutral (you can use AWE on 32-bit non-PAE, 32-bit PAE, 32-bit-on-64-bit x64, 64-bit x64, 64-bit IA64).

With SP2, MS removed that ability because certain companies don't know how to write drivers which can sit in physical address space above the 4GB boundary (nVidia).

Not exactly. The drivers don't care where their code is in physical address space. What matters are certain types of buffers allocated by the OS for the drivers' use. For some kinds of devices and transfers (bus-master DMA) the driver must manipulate physical addresses of these buffers. In cases where the device doesn't have a scatter/gather map of its own the operating system has to double-buffer the transfer through a physically contiguous buffer. Some drivers just don't do the right thing in handling these buffers.

Windows XP or XP SP1 NEVER let you see exactly all 4 GB of RAM, even with PAE enabled; there was usually some slight lossage. The lossage simply increased under SP2.

What is particularly galling about this (but, sadly, not surprising) is that Windows NT 3.1 (1993) implemented a PHYSICAL_ADDRESS typedef for exactly this purpose: Describing addresses in physical memory. This has always been a 64-bit structure. Similarly there has always been a family of DMA support routines that handle all the necessary buffer management behind the driver's back... if you use them correctly. So drivers should have, and could have, been DTRT since Day One of NT.

It reminds me of the multithreading issues we started to see en masse when hyperthreading showed up. Before HT, I lost count of the number of hardware vendors who told me their drivers would never support MP: "Our device is a consumer device! MP is for servers!" Then HT CPUs started showing up in consumer machines. Surprise!

What is particularly galling about this (but, sadly, not surprising) is that Windows NT 3.1 (1993) implemented a PHYSICAL_ADDRESS typedef for exactly this purpose: Describing addresses in physical memory. This has always been a 64-bit structure. Similarly there has always been a family of DMA support routines that handle all the necessary buffer management behind the driver's back... if you use them correctly. So drivers should have, and could have, been DTRT since Day One of NT.

What takes this annoyance to new heights is that since XPSP2 changed behaviour, drives that used to work subsequently stopped working. I had 4 GiB with Server 2003 Enterprise (so all 4 GiB available in PAE mode). NVIDIA drivers prior to 8x.yy worked properly. But 8x.yy and newer suffered massive, fatal screen corruption when used in that same configuration. I presume that the XPSP2 changes meant that NVIDIA were no longer properly testing high memory configurations and so didn't notice when they broke something.

Interesting. I have 2 linux servers with 4GB of RAM. One is a Dell 2950. /proc/meminfo shows 3365968 kB total (828,336 KB lost). The other is a homebuilt machine with Tyan Thunder K8W motherboard and /proc/meminfo shows 4087728 kB (106,576 KB lost).

Yeah, I know, but these are servers with video on the motherboard. The dell, I believe, has 32MB on it's built-in ATI ES1000. The Tyan has 8 or 16MB for video. So it's not video that's taking up hundreds of megs.

Oh wait, I just realized the Dell has hardware RAID whereas the Tyan machine is using software raid, so it's probably the memory of the RAID controller that is overlapping. It might be a 512MB card, which could explain much of the 700+ meg difference in total memory between the two machines.

Originally posted by Ðaedalus:I have 4GB and my 32bit XP only sees 2.75 I upgraded from 2GB because it was dirt cheap and I wanted to future proof this rig.

I had this same problem. I solved the problem for myself.

I also bought another gig (raising my memory to 4, the most my MB can handle) to future proof my box. I also upgraded my chip to the fastest it could handle. I only saw the 3.25 gig. I am running XP64 PRO. I also have Ubuntu 64. My chip is AMD 64.

For me, it was a BIOS issue. In the BIOS on my 9NPA+ultra EPOX board, (which I got to by holding down the "delete" key when the computer first booted), there was a setting "advanced bios settings" or some such dialog, and in there there was a dialog for DRAM settings then in THAT dialog there was something, not about mapping, but something else that I had to select enable on. I know how vague that sounds, but unless you have my MB, it's not going to be exact anyways.

The point is, I used to see 3.25 and fixed it by changing the DRAM settings on my BIOS, so there you go. Now I see 4 gigs and my computer is quite a bit more responsive .

By the way, since more people than me it appears think to future proof their computers before doing the MB CPU RAM swap required for moving to the next platform, here's a bit of advice straight from the mouth of my RAM tech help, Kingston, whom I do not work for but from whom I will always buy RAM since they answer their phones in two rings, in English and are EXCEEDINGLY polite and helpful and RMA my ead RAM 4 years after I bought it... wow....anyway...for RAM pricing, there comes a point in time when the new faster RAM is being phased in and the old slower RAM phased out.... the new platform is what non bargain-hunters are buying who are getting new systems for the most part.. at that time, which is now for the 939-->AM2 shift, the price of RAM for the 939 (older) MB starts to creep up... that's because the RAM makers are not making as much... so before that happens , you want to buy if you're going to...

I thought the price of RAM would go down as it became less popular, but that's now how market forces play out in this instance.

Just to clarify for the layman (me) this 4GB limit breaks down roughly to the following effect:System memory (any amount installed 4GB and above) - other hardware (vid card, possibly RAID card, maybe other?) = total physical system memory that the OS can use

In the above case are there any negative effects (performance or otherwise) from having 4GB rather than 2GB of RAM installed ... like in dual channel situations?

Originally posted by cjx:Just to clarify for the layman (me) this 4GB limit breaks down roughly to the following effect:System memory (any amount installed 4GB and above) - other hardware (vid card, possibly RAID card, maybe other?) = total physical system memory that the OS can use

Only for OSes that are limited to the first 4GB of physical addresses. For those that understand addresses over 32bits, it's not an issue. For example, the original Windows XP (pre-SP) can make use of RAM that is relocated above the 4GB boundary, but it is still limited to 4GB total space.

quote:

In the above case are there any negative effects (performance or otherwise) from having 4GB rather than 2GB of RAM installed ... like in dual channel situations?

Anyway, I've been concerned about this 4GB limitation for a while, and this thread pretty much covers the meat of the issues. My one question: Is it worth bothering with Vista32 with respect to a newly built system?

My take on it is that most hardware is now 64bit (or will be very soon). So in the interest of future proofing and such, since Vista32 has the same limitations as XPSP2, why bother with it on a newly built system assuming one is attempting to max out the capabilities (read: 4GB+ system memory)? I'm considering Vista64, but I have concerns with drivers (which is for another thread entirely).

So, 32-bit OSes can in fact see 64GB of RAM, assuming they aren't intentionally crippled, and the CPU support PAE, right? (This is what the CPU feature is called that allows its ability to address 36-bits worth of physical memory, or is 36-bit addressing a given for CPUs, and needs to be enabled in the OS?) However, the apps are limited to 2GB, or 3GB if the switch is enabled, and the app is coded to take advantage of it?

What about 64-bit apps in a 64-bit OS? What amount of memory can they utilize?

Originally posted by eluder:So, 32-bit OSes can in fact see 64GB of RAM, assuming they aren't intentionally crippled, and the CPU support PAE, right?

And your chipset supports that much RAM. There's three pieces to the puzzle.

quote:

However, the apps are limited to 2GB, or 3GB if the switch is enabled, and the app is coded to take advantage of it?

Right, of virtual memory. Which means you could have a thousand applications each using 2GiB of RAM each as long as you had the hard drive space to back all the data and the patience for it to be swapped in and out.

Limits vary on OS:

Windows supports 2 GiB or 3 GiB based on a switch.

Linux supports 3 by default, but 2 and 4 are available via either a patch or I believe a configuration option in the latest versions

OS X provides 4 all the time. This has negative performance ramifications.

quote:

What about 64-bit apps in a 64-bit OS? What amount of memory can they utilize?

Interesting. So, a 64-bit app on a 64-bit processor in Windows has the ability to use 43-bits of virtual address space, and isn't limited to the 2GB or 3GB (with prereqs met) of a 32-bit app on a 32-bit OS?

This same 64-bit machine would have the ability to recognize 48-bits worth of memory however?

Originally posted by eluder:Interesting. So, a 64-bit app on a 64-bit processor in Windows has the ability to use 43-bits of virtual address space, and isn't limited to the 2GB or 3GB (with prereqs met) of a 32-bit app on a 32-bit OS?

Correct. In the future, it will have the potential ability to use 64-bits of space (when the hardware supports it), assuming it was correctly coded (more than likely) and depending on the split make up.

quote:

This same 64-bit machine would have the ability to recognize 48-bits worth of memory however?

No. Current x64 processors support 40-bits (1 TiB) of physical addressing. Some older Intel ones only supported 36-bits. The upper limit is 52-bits.

It's still more RAM then you're likely to use.

Remember applications see virtual addresses and hardware sees physical addresses. The two have totally disjoint limits. If they're the same, it's coincidence.

The long answer is that to speed virtual memory, a cache called the Translation Lookaside Buffer (TLB) exists on your CPU. It caches the mappings between a virtual address and a physical address. Not caching these mappings would result in require several memory operations before the actual operation we wanted could be performed.

On x86, whenever the register holding the base of the page table hieharchy is reloaded, the entire TLB is flushed. This happens essentially on every task switch.

If each userspace application got 4GiB, this would also have to occur everytime the kernel was accessed. And applications call to the kernel more than to anything else. As such, part of the virtual memory space is permamently given to the kernel to avoid this problem.

We have a SQL 2005 server here running Wind 2003 Ent with 16GB of RAM. It is 32-bit Windows. I don't think 64-bit apps can run on 32-bit, so why is only 1GB of RAM free? I'm positive SQL is caching, which is great, but how can it use so much memory if it's 32-bit?

Originally posted by eluder:We have a SQL 2005 server here running Wind 2003 Ent with 16GB of RAM. It is 32-bit Windows. I don't think 64-bit apps can run on 32-bit, so why is only 1GB of RAM free?

You can't use the RAM free numbers to determine how much RAM is being used by an application.

RAM is used for things like caching (though 2003's reporting is funny) kernel, and everything else on your box.

But, SQL Server supports a technology (presumably you're using) called AWE, which allows the kernel to treat some of the RAM like a window. The best way to imagine it is a slide projector. You can only look at one image (portion of memory) at a time, but by pressing the next and back buttons you can see different slides. AWE provides that to applications, allowing a process to "use" more than 2GiB of RAM while only seeing that much at a time.

quote:

I'm positive SQL is caching, which is great, but how can it use so much memory if it's 32-bit?

But, SQL Server supports a technology (presumably you're using) called AWE, which allows the kernel to treat some of the RAM like a window. The best way to imagine it is a slide projector. You can only look at one image (portion of memory) at a time, but by pressing the next and back buttons you can see different slides. AWE provides that to applications, allowing a process to "use" more than 2GiB of RAM while only seeing that much at a time.

But, SQL Server supports a technology (presumably you're using) called AWE, which allows the kernel to treat some of the RAMyour virtual address space like a window. The best way to imagine it is a slide projector. With each AWE window You can only look at one image (portion of memory) at a time, but by pressing the next and back buttons you can see different slides.

And you can set up multiple windows, so you can look at several portions of memory at once.

This looks like a great place to ask my questions about 4GB and /PAE. Most definitely server related though...

I'm at that stage in life where you wonder whether to consider running Citrix servers with W2003x64, or stay 32-bit for another generation. Sadly printer drivers are likely to be a total bloody mess still (W2008 with that universal printing thing will hopefully be the answer, but while we wait....), so I'm now wondering what the best way is to configure a Citrix environment.

Now I fully understand the use of /PAE and turning on AWE in things like SQL Servers, but obviously no apps in a Citrix server are compiled with LARGEADDRESSAWARE or AWE, and they don't need to be either.

In the past I'd have said "2 CPUs, 4GB RAM, don't use either /3GB or /PAE because both reduce your available system PTEs and will lower your scalability", but after extensive performance checking I'm not so sure. Based on http://members.shaw.ca/bsanders/WindowsGeneralWeb/RAMVi...emoryPageFileEtc.htm I added up the contents of the 5 memory counters and concluded that my kernel memory use is running at around 500MB on existing citrix servers, and free system PTEs is still in the tens of thousands, so my current workload seems not to be kernel memory bound (instead it is definitely user memory and CPU bound). The implication is that adding more RAM and more users could well be possible if I were to use /PAE mode. However, I have no clue what kind of scalability I will reach.

I recall reading an article once stating that with /PAE mode on, your OS may be able to see up to max 64GB of RAM (apparently according to MS docs for W2003 SP1), but the RAM above 4GB isn't fully useable: It is only available for storing data pages, not executable programs. However, I can't find this article, and I'm wondering if I'm going mad. Is this true? If so, it would mean that the advantages to /PAE and 16GB RAM in a Citrix server would be minimal assuming mixed workload and a whole bunch of apps running.

I would greatly appreciate some help understanding how to determine the likely scalability if I were to go for a building block of 16GB RAM and /PAE in my 32-bit Citrix servers.

Originally posted by Skazz:I recall reading an article once stating that with /PAE mode on, your OS may be able to see up to max 64GB of RAM (apparently according to MS docs for W2003 SP1), but the RAM above 4GB isn't fully useable: It is only available for storing data pages, not executable programs. However, I can't find this article, and I'm wondering if I'm going mad. Is this true?

No, the RAM above the 32-bit address boundary is just as good as any other RAM, for applications.

I recall reading an article once stating that with /PAE mode on, your OS may be able to see up to max 64GB of RAM (apparently according to MS docs for W2003 SP1), but the RAM above 4GB isn't fully useable: It is only available for storing data pages, not executable programs.

You're remembering AWE.

"The AWE API does not permit executable code (.exe, .dll, .sys, and so on) to execute from within an AWE window region in the process’s virtual memory or in the mapped physical memory pool the process is utilizing."