[Editors' Note: Jonathan Morrison, a dev in Microsoft's Core Operating Systems group, has started a blog about Windows kernel-mode issues. Knowing Jonathan, it didn't surprise us after reading the first few installments that it was more esoteric and far more interesting than the average bloggo-junk. Heck, Jonathan's blog entries were even useful. When we saw his blog entry on capturing User Mode pointers, we asked him if he wouldn't mind expanding it into an article for The NT Insider. And he said yes! Of course, due to an editing snafu, we didn't get Jonathan's article in the March-April issue of The NT Insider. Sorry Jonathan - it'll be in the next issue! In the meantime, enjoy the article and check out Jonathan's blog at http://blogs.msdn.com/itgoestoeleven.]

Many moons ago I wrote some kernel code, self-reviewed it and sent it to my mentor and good friend Neill Clift for the obligatory code review. Not expecting any comments on the code (it was a simple kernel interface), I was puzzled by one of his comments. He said "You will want to make sure the compiler doesn't optimize away your captures there". I was like "Oh - yeah - of course - good catch". Having absolutely no idea what Neill was talking about, I went back to my office and started looking at my capture code more carefully. All the sudden it dawned on me exactly what he was talking about. In my code I had in no way indicated to the compiler that my captures were a critical part of my code. Neill had suspected this was a problem area for quite some time. I wanted to find out for sure if it was or not. What ensued was a personal journey of pain, confusion and personal knowledge increase through the land of compiler architectural guarantees, trying to infer rules where there were apparently none and mapping environmental rules into an environmentally agnostic compiler. This article is a sampler platter of the fruit collected on that sojourn. Yeehaw!

User mode pointers passed to kernel mode code must point to data that is wholly contained in user mode address space. Checking this property of a user mode pointer is called probing. Contrary to popular belief a probe does not touch the memory pointed to by the pointer, it just does an address range calculation on it. The calculation is basically "(pointer + LengthOfDataPointedTo) must be less than the highest legal user-mode (UM) address". The reason we need to probe user mode pointers is to make sure that a UM component can't write or read kernel space. When a user mode caller passes a pointer into a kernel mode component, the pointer is copied onto the kernel stack as part of the calling mechanism (sometimes called a "system call", sysenter, a "trap", etc.). The user has no way to change the value of that pointer once it is passed in to the kernel interface - so we can validate the pointer with confidence - in other words - we know that its value won't change underneath us. However, if that pointer is a pointer to a structure with embedded pointers, those internal pointers can be changing asynchronously from other threads in the system. This is problematic because we need to validate all of the embedded pointers in a passed-in structure to make sure they don't point into kernel mode. This is where capturing comes in to play. We capture the embedded pointers by storing their value in kernel mode space - usually the stack - by reading the embedded pointers through the already captured and validated pointer. Once the embedded pointer is captured - we probe it, lather - rinse - repeat for the entire depth of the embedded pointers tree.

// // See if this is user mode - in a driver PreviousMode // would normally be read from a field in the IRP and the // pointer would come from the Type3InputBuffer field, but // for simplicity's sake we will just use parameters directly //

if (ExGetPreviousMode() != KernelMode) {

try {

// // Probe the passed in structure //

ProbeForRead(Data, sizeof(USER_DATA), __alignof(USER_DATA));

// // Capture the embedded pointers //

CapturedData1 = Data->Data1; CapturedData2 = Data->Data2;

// // Probe the first captured pointer //

ProbeForRead(CapturedData1, sizeof(ULONG_PTR), __alignof(ULONG_PTR));

// // Probe the second captured pointer //

ProbeForRead(CapturedData2, sizeof(ULONG_PTR), __alignof(ULONG_PTR));

// // Read the first embedded pointer //

Data1Value = *CapturedData1;

// // Read the second embedded pointer //

Data2Value = *CapturedData2;

// // More of your code here that does really cool stuff... //

} except (EXCEPTION_EXECUTE_HANDLER) { return GetExceptionCode(); } }

return STATUS_SUCCESS;}

At first glance everything seems to be OK with this code. We probe the structure pointer, capture the embedded pointers to local variables and then probe them. But wait - let's think about our ever important capture code a little deeper. The most important attribute of our capture code is that it stores the embedded pointer in a location where the user can't modify it. If it didn't, then we would be in really bad shape as the user could change the embedded pointers to point in to the kernel address space. So the question is: Does our capture code in fact guarantee that the embedded pointers will be in a location that can?t be modified from user mode? Unfortunately, the answer is NO! But how is this possible? Although we suggested to the compiler that we wanted to store the pointers locally by assigning them to local variables, we didn?t do anything to tell the compiler that it was critical and mandatory that they were stored locally. Given this fact, the compiler can freely skip the local storage of the embedded pointers and re-fetch them from user mode through the original pointer upon each later reference. This is potentially disastrous for our kernel mode code and not at all what we expected or intended.

So what can we do to get the behavior we require? Easy - we have to tell the compiler the truth about the code that we are writing. That's right - the truth. We are lying in our code. Our code has implicitly told the compiler that our embedded pointers can't change asynchronously. This is a lie, as they can change because we are in a shared address space with multiple threads of execution. So in order for our code to be correct, we need to change it to a truthful representation of itself. But how do we tell the compiler that our pointed-to structure can change? A couple of ways. The most straightforward way is to mark the passed-in parameter with the keyword volatile. When applied to a memory location, volatile tells the compiler that that its contents can change asynchronously. Volatile forces all reads and writes to a memory location to actually happen and in the order they are specified in the code. The volatile type modifier was put into the C language to deal with code that reads and writes memory that can change in a different scope (i.e., interrupt routines, hardware device registers, device memory, shared memory, etc.) and we can take advantage of its semantics for our user mode pointer captures. With hardware - a reordered, omitted or combined read or write could lead to real life disasters. Hardware reads as well as writes have side effects (for instance a read of a register can trigger an interrupt or change a subsequent register value); this is completely analogous to our code: a read can have the side effect of violating our driver's security mechanism.

So how can we fix our code? One solution is to change our routine like so:

NTSTATUSFoo( volatile USER_DATA* Data );

By changing the pointer Data, to be a pointer to a volatile structure we will force all reads and writes to the structure to really happen in our code (bonus points for explaining why we can't use "volatile PUSER_DATA" as our parameter type instead of "volatile USER_DATA*" - aren't they the same thing? :D ). However, if we have existing interfaces that we must maintain - we can't do this. Hmmm - this doesn't seem too good. There is another way to get the behavior we want. We can cast at the capture site (the place in the code where we dereference the user mode pointer). This technique is called using "volatile glasses". Here is an example:

This will cause the compiler to perform the capture as if the variable Data had been declared volatile. Using this technique prevents the compiler from re-fetching from the passed-in pointer because we told the compiler the truth. We said "Hey compiler - this thing that Data points to can change asynchronously, so you'd better not be playing any funny games with it". And the compiler will honor that. It has to if it honors the volatile keyword. We would then have to do the same thing for the internal reads as well:

// // Read the first embedded pointer //

Data1Value = *(volatile ULONG_PTR*)CapturedData1;

// // Read the second embedded pointer //

Data2Value = *(volatile ULONG_PTR*)CapturedData2;

Again, we are telling the compiler the truth here - that the ULONG_PTR value can change asynchronously and it needs to really capture it locally.

What if you use memcpy()to capture user mode structures? Does that mean you are also golden? Well as long as the memcpy() doesn't get inlined and the resulting loop unrolled. Because if that happens, you are effectively back to assignments to a non-volatile local - in other words - square one.

So what is the correct way to address this issue in your code? Well, it depends (don't you love that answer!). At the end of the day you need to tell the compiler the truth so that it can generate code that represents the intent of your source files. If you have the luxury of not needing to maintain backwards compatibility with anything, then you can just convert your "pointers" to "pointers to volatile structures". If not then the "volatile glasses" approach may work for you. If you can guarantee that memcpy() won't inline and unroll, then that may be the way to go. The main point here is awareness. Also, in today's world it is unlikely that a compiler will screw you in such a horrible way. But in a not too distant future world, this could become a very common compiler optimization and ruin what we software guys call "the good life"!

Jonathan Morrison is a Senior Software Engineer on the "Windows Fundamentals and Reliability" team. When he is not busy making Windows more fundamental or reliable, you can find him building robots with his wife and kids, jamming out on his guitar to old school heavy metal or writing preemptive, multi-tasking OSes (from scratch!) for seriously memory constrained 8 bit AVR microcontrollers (think 4K of memory on the high end!) - just to see if it can be done! You can reach him at jonmorri@microsoft.com.