When you crash, make sure you crash in the right place

Last time, I recommended that functions should just crash when given invalid pointers. There's a subtlety to this advice, however, and that's making sure you crash in the right place.

If your function and your function's caller both reside on the same side of a security boundary, then go ahead and crash inside your function. If the caller is a bad guy who is trying to get your function to crash, then there's nothing the caller has accomplished if your function runs in the same security context as the caller. After all, if the caller wanted to make your program do something bad, it could've just done that bad thing itself! If it gave you a pointer to invalid memory and you crashed trying to access it, well the caller could have accomplished the same thing by just accessing the invalid memory directly.

If your function resides on the other side of a security boundary, however, then having your function crash or behave erratically gives the malicious caller a power which he did not already have. For example, your function may reside in a service or local server, where the call arrives from another process. A malicious caller can pass intentionally malformed data to you via some form of IPC, causing your service or local server to crash. Or your function might reside in the same process as the caller but under a different security context. For example, it might be impersonating, or it may be operating on untrusted data.

Another example of a security boundary is the boundary between user mode and kernel mode. Kernel mode cannot crash on parameters passed from user mode, because kernel mode runs at a higher protection level from user mode.

In these cases, you want to make sure you crash in the correct context. In the IPC case, there typically will be a stub on the client side that does the hard work of taking the parameters and packaging them up for IPC. If the stub is given an invalid pointer, it should crash in the stub, so that the crash occurs in the same security context as the caller. A caller who passes an invalid pointer by mistake can then debug the crash in a context that is meaningful to the caller. (Of course, a malicious caller won't use your stub but will instead package the data manually and IPC it directly to the server. Your server can't crash on malicious inbound data, since that data came from a different security context.)

If you're feeling really ambitious (and few people do), you can have the server react to malformed data by returning a special error code, which the stub detects and converts to an exception. Again, this doesn't do anything to crash the malicious caller, because the malicious caller is bypassing your stub. But it may help the caller who thought it was passing a valid pointer.

Well the type of errors that you would choke on in a IPC scenerio is different from the type of errors that would really mess you up localy.

Most IPC calls cannot actually refer to memory positions / pointer at the phyisical server hardware level – rather they will refer to a special object created and manged by your own service – that is definatlly something that your service should be able to validate without bringing down the whole service.

When you say “Kernel mode cannot crash on parameters passed from user mode, because kernel mode runs at a higher protection level from user mode” and “Your server can’t crash on malicious inbound data, since that data came from a different security context”, do you mean “shouldn’t” instead of “can’t”?

Kernel mode and server code CAN crash wherever they are poorly coded, or rely on invalid incoming pointers without checking them. I don’t understand the “can’t” here.

[Yes, that was a prescriptive “can’t”, not a descriptive one. In the same way that you can’t drive a car without a license. -Raymond]

I would also suggest changing ‘cannot’ and ‘can’t’ to ‘must not’ and ‘mustn’t’.

Chris: kernel mode simply reads directly from the user mode addresses. User mode has the top 2GB (3GB if booted /3GB) (variable between 2GB and 3GB if booted /USERVA) (8TB on x64) of the address space, kernel mode the bottom 2GB (1GB, 4GB minus whatever user-mode got, 8TB). When calling into kernel mode directly, or taking an interrupt, the user-mode addresses are still valid for the process that called into kernel mode, or that was running on the processor that handled the interrupt.

Kernel mode routines that have to do something asynchronously – most of them – have to copy or map the user’s buffer into kernel address space so that when an interrupt occurs, signalling the end of the operation, the addresses are correct. This is because the processor could have been switched to a thread in another process in the meantime (or the interrupt could happen on a different processor).

From the perspective of the user-mode code, the thread could be making a blocking call, but the kernel does not wait for the request to complete. Instead it notes that the thread needs to be woken up when the I/O completes, marks the thread as blocking (so won’t be scheduled), and finds a different thread to run.

Alternatively the thread might be making an asynchronous call, in which case the kernel might want to make a copy of the user mode buffer so that the program doesn’t change the data while the asynchronous action is occurring, at least for any part of the buffer that influences a security decision.

Mike: thanks; I’d forgotten about the 2/2 split. What’s confusing me here is Raymond’s assertion of the need to validate the pointer in the userspace glue code rather than in the kernel. Or do I misunderstand?

I guess my question is: (a) can you do anything useful with a pointer direct from userspace and if so, then (b) if you try to use such a pointer and it’s a bad one, you’ve just invoked a BSOD, right? Iff that’s all true, it seems rather dangerous, because somebody could craft malicious glue code.

I’m sure I’m misunderstanding something. :) My experience with kernel development is with OS X, where you have a bit of extra overhead with kernel<->user IPC, but it is completely safe — as long as the kernel-side bits validate all of the parameters.

Re: —- In the IPC case, there typically will be a stub on the client side that does the hard work of taking the parameters and packaging them up for IPC. If the stub is given an invalid pointer, it should crash in the stub, so that the crash occurs in the same security context as the caller. —-

This approach might be useful for debuggability (so we crash ‘early’) but it is insufficient for security.

If the parameter-validation functions are in the same security context as the code that conjured up the parameter values, then a bad guy can let the check proceed, and then (maybe via a debugger) get in and poison the checked values.

In order words, for security, the check MUST be made in a context that is inaccessible to the caller. Which is to say, on the ‘secure’ side of the barrier.

For user/kernel interfaces, this means that the check has to be made in kernel mode. Not only that, but you need to copy the arguments out of user space before you validate them. Otherwise, something else in user space (another thread, say) could poison the args after they had been validated by the thread on which the kernel service was called.

I’m not saying all data has to be copied in — only that which has a security significance. So you might need to copy a pointer, but not necessarily the thing that is pointed to. Though you’ll need to take care in case the pointed-to thing gets invalidated (removed from the address space) after the pointer was copied and checked.

So, the check has to be made in secure context.

The crash, of course, must occur back in the caller’s insecure context. For the user/kernel case, that means reflecting the access-violation exception back to user mode.

Back in the IPC case, the server has to perform any necesarary validation even if the client stub already did it. (“Never crash because of something you see in a network message”). The job might be rather easier, since it’s probable that the args have been serialized for transport on the wire in such a way that there are no pointers that need to be checked.

Mike: Why do you think try/except can’t be used in kernel mode? The following MSDN article (reposted from a comment made yesterday) says that exceptions can be handled (with some limitations, perhaps):

It’s interesting (well to me anyway) that the kernel-mode discussion comes around full circle. The kernel needs to protect itself from bad data coming from user mode, usually via copying the data while under a try/except. However, once in the kernel everything is considered trusted and if you touch an invalid address you bluescreen, and try/except *cannot* stop it.

I agree with Raymond here, user mode code should crash and burn (and lousy programmer should be flogged with shielded SCSI cable in public) if it passes invalid arguments to the API. I am sure that having to check those arguments both in kernel and in user mode slows things down a lot without doing any good.

So the only place where you should check for argument validity is in kernel mode because bad guys have already learned to bypass the user mode stubs and are "ringing in" directly.

"I must admit, it would have been nice if some of the WinAPI functions would have ‘crashed’/thrown an exception if given garbage, rather than simply returning error codes that nobody remembers to check :) But I can see why they didn’t, since the structured exception mechanism is a pain in the butt to work with and C++ exceptions aren’t portable (and didn’t really exist back then anyway)."

Firstly, the win32 api code as I understand is written by Microsoft in "C". Someone from Microsoft correct me if I am wrong with the language part.Which api according to you should crash? Ofcourse there are some apis which do crash on being passed invalid arguements.That is not because they intentionally crash, but they crash when using invalid parameters. Also exception handling makes the process slow, even if you use within your application, there is always a rule to use it with caution.

There is a lot of C++ in Windows. I couldn’t tell you where the C++ stops and the C starts, but when I have to step into windows to try to debug strange issues, I often find myself in C++ named routines. OLE is full of C++.

I’m assuming that the "security boundaries" you’re speaking of refer to any IPC, not just cross-user IPC? Because otherwise you can run afoul of cases where some third-party system is assigning additional privileges on a per-process basis without actually running at different official privilege levels. (Application layer firewalls are a good example of this — only some processes are permitted to open ports or connect to the Internet.)

So essentially this whole thing boils down to "in internal functions, crash rather than returning error codes. For any function that’s exposed to the outside world, return error codes instead of crashing — and if you want to be helpful, *also* have a client-side wrapper that makes it crash when it receives that error code."

I must admit, it would have been nice if some of the WinAPI functions would have ‘crashed’/thrown an exception if given garbage, rather than simply returning error codes that nobody remembers to check :) But I can see why they didn’t, since the structured exception mechanism is a pain in the butt to work with and C++ exceptions aren’t portable (and didn’t really exist back then anyway).

Mike, you are right and wrong at the same time. In kernel mode you *can* use try/except to protect against bad memory, but there are cases in which it won’t help. For example, if the memory wasn’t mapped to a physical address, it will be a BSOD (you are supposed to use probe&lock in that case). Or if the code was executing in a hardware interrupt (IRQL > DISPATCH_LEVEL), any SEH exception is automatically fatal, because virtually no part of the kernel is designed to be used in a more critical context than a software interrupt (except of course the DPC queue API to schedule a software interrupt that handles a hardware interrupt…)

Raymond, you state in your previous post that one should never use IsBadXXXPtr, while in this post you state that code that takes input from another security context should not crash on said input but should return a “special” error code. So, assuming I want to make sure that an incoming pointer argument does not cause me to inadvertently trip some other thread’s stack guard page, how do I validate the incoming data? Do I use VirtualQuery? Expensive! What are my alternatives?

[Don’t pass pointers across security boundaries. (As always, I’m providing ground rules not absolutes; there may be specific cases that come with their own guidance, such as the user mode/kernel mode boundary.) -Raymond]

Thanks JeffCurless. Mike nearly had me go off and reread the Kernel documention, in fear of needing to change my meme(*) about the 2GB/2GB split.

(*) I’m not sure "meme" is the correct here. Perhaps knowledge tag/phrase is more appropriate. Basically, I remember that to live above the 2GB boundary, code needs to be 32-bit-clean. In other words, all pointer arithmetic has to be done using unsigned 32-bit, and NOT Integer. User code is usually not 32-bit-clean, requiring effort to validate that it is – all off it !.