Why hooking system services is more difficult (and dangerous) than it looks

System service hooking (and kernel mode hooking in general) is one of those near-and-dear things to me which I’ve got mixed feelings about. (System service hooking refers to intercepting system calls, like NtCreateFile, in kernel mode using a custom driver.)

On one hand, hooking things like system services can be extremely tricky at best, and outright dangerous at worst. There are a number of ways to write code that is almost right, but will fail in rare edge cases. Furthermore, there are even more ways to write code that will behave correctly as long as system service callers “play nicely” and do the right thing, but sneaks in security holes that only become visible once somebody in user mode tries to “bend the rules” a bit.

On the other hand, though, there are certainly some interesting things that one can do in kernel mode, for which there really aren’t any supported or well-defined extensibility interfaces without “getting one’s hands dirty” with a little hooking, here or there.

Microsoft’s policy on hooking in kernel mode (whether patching system services or otherwise) is pretty clear: They would very much rather that nobody even thought about doing anything remotely like code patching or hooking. Having seen virtually every anti-virus product under the sun employ hooking in one way or another at some point during their lifetime (and subsequently fail to do it correctly, usually with catastrophic consequences), I can certainly understand where they are coming from. Then again, I have also been on the other side of the fence once or twice, having been blocked from implementing a particular useful new capability due to a certain anti-patching system on recent Windows versions that shall remain nameless, at least for this posting.

Additionally, in further defense of Microsoft’s position, the vast majority of code I have seen in the field that has performed some sort of kernel mode hooking has done some out of a lack of understanding of the available and supported kernel mode extensibility interfaces. There are, furthermore, a number of things where hooking system services won’t even get the desired result (even if it weren’t otherwise fraught with problems); for instance, attempting to catch file I/O by hooking NtReadFile/NtReadFileScatter/NtWriteFile/NtWriteFileGather will cause one to completely miss any I/O that is performed using a mapped section view.

Nonetheless, system service hooking seems to be an ever-increasing trend, one that doesn’t seem to be likely to go away any time soon (at least for 32-bit Windows). There are also, of course, bits of malicious code out there which attempt to do system service hooking as well, though in my experience, once again, the worst (and most common) offenders have been off-the-shelf software that was likely written by someone who didn’t really understand the consequences of what they were doing.

Rather than weigh in on the side of “Microsoft should allow all kernel mode hooking”, or “All code hooking is bad”, however, I thought that a different viewpoint might be worth considering on this subject. At the risk of incurring Don Burn’s wrath by even mentioning the subject (an occurrence that is quite common on the NTDEV list (if one happens to follow that), I figured that it might be instructive to provide an example of just how easy it is to slip up in a seemingly innocent way while writing a system service hook. Of course, “innocent slip-ups” in kernel mode often equate to bugchecks at their best, or security holes at their worst.

Thus, the following, an example of how a perfectly honest attempt at hooking a system service can go horribly wrong. This is not intended to be a tutorial in how to hook system services, but rather an illustration of how one can easily create all sorts of nasty bugs even while trying to be careful.

The following code assumes that the system service in question (NtCreateSection) has had a hook “safely” installed (BuggyDriverHookNtCreateSection). This posting doesn’t even touch on the many and varied problems of safely trying to hook (or even worse, unhook) a system service, which are also typically done wrong in my experience. Even discounting the diciness of those problems, there’s plenty that can go wrong here.

I’ll post a discussion of some of the worst bugs in this code later on, after people have had a chance to mull it over for a little while. This routine is a fairly standard attempt to post-process a system service by doing some additional work after it completes. Feel free to post a comment if you think you have found one of the problems (there are several). Bonus points if you can identify why something is broken and not just that it is broken (e.g. provide a scenario where the existing code breaks). Even more bonus points if you can explain how to fix the problem you have found without introducing yet another problem or otherwise making things worse – by asking that, I am more wanting to show that there are a great deal of subleties at play here, instead of simply showing how to operate an NtCreateSection hook correctly. Oh, and there’s certainly more than one bug here to be found, as well.

N.B. I haven’t run any of this through a compiler, so excuse any syntax errors that I missed.

//
// Okay, we've got a successful call made, let's do our work.
// First, capture the returned section handle. Note that we do not need to
// do a probe as that was already done by NtCreateSection, but we still do
// need to use SEH.
//

//
// If we are an image section, then let's save away a pointer to to the
// section object for our own use later.
//

if (NT_SUCCESS(Status))
{
//
// Save pointer away for something that we might do with it later. We might
// want to care about the section image information for some unspecified
// reason, so we will copy that and save it in our tracking list. For
// example, maybe we want to map a view of the section into the initial
// system process from a worker thread.
//

For one thing, the lines to capture the file and section handles from user memory aren’t reliable. Another thread in the calling process could have changed the value at SectionHandle and FileHandle between the calls to RealNtCreateSection and where it’s read by this function.

What the function really needs is a private channel to pass the handles between BuggyDriver and Real NtCreateSection, not involving user memory.

The only way I know of to do that is to create local copies of those parameters and pass them to ZwCreateSection directly, but then BuggyDriver would have to duplicate all the security and validity checks that NtCreateSection does. This would also preclude chaining BuggyDriver’s hook with service hooks by other drivers (because ZwCreateSection would have to be called directly instead of whatever was previously installed in the service table), which could lead to bad things… Maybe that could be fixed by setting PreviousMode to KernelMode instead of calling ZwCreateSection directly?

I don’t think that ObDereferenceObject is called consistently for cleanup for SectionObject.

FileHandle in NtCreateSection is optional, but this function will always fail if the caller passes NULL, due to the unconditonal call to ObReferenceObjectByHandle.

Since this is a highest-level interface to user mode, the code should really be wrapped in KeEnterCriticalSection to prevent it from being suspended in the middle (possibly while holding locks in SaveImageSectionObjectInList).

Sections can be extended, but this function assumes the size can’t change.

Personally, I think that hooking system services is so fraught with peril that the potential benefits don’t outweight the risks. It’d be nice if the OS provided more official methods of hooking, but these APIs are hard to implement correctly, let alone having to support more customization.
Besides, there already are official ways to filter file IO, and since 5.2, registry IO.

I really enjoyed this post, and look forward to what things I missed and other posts in the future! :)

The code paths that return errors do not close the section that was created by RealNtCreateSection(). If any of the additional work fails, the section will be leaked (unless the user application closes the section even after NtCreateSection() returns an error).

On a system with more than one logical processor, another thread of a malicious user application could modify the returned section handle before BuggyDriverNtCreateSection() captures it, causing the additional work to be performed on the wrong section. Since this code passes KernelMode to ObReferenceObjectByHandle(), this would also bypass access checks on that section.

Similarly, the section size could be modified before BuggyDriverNtCreateSection() captures it, defeating any future length-checking.

Yeah, this should be a fun one – the code is just full of all sorts of nasty, subtle bugs that a malicious user mode caller can trigger to do all number of evil things. Definitely interested to see if they all get found out.

Good catches so far, though!

The caller could indeed modify the returned *SectionHandle value between when NtCreateSection returns and the rest of the system service hook runs. The value of FileHandle is safe as that is passed by value and not by pointer. However, not to give too much away, but there’s at least several more serious bugs with how the hook uses object handles remaining, even if you call ZwCreateSection and save the output handle on the kernel stack.

As for breaking other person’s hooks, that depends on how they were hooked in. If they perform an opcode patch on nt!NtCreateSection, then I think that (all issues with hooking “safely” and all of that aside), things should still work. If someone decides to modify the function pointer in the SSDT, then things get a bit more interesting. (And, actually, this hook will always break if it is installed with an SSDT function pointer substitution (as opposed to an opcode patch on nt!NtCreateSection), as ZwCreateSection initiates a system call by ordinal, so without a pass through for ExGetPreviousMode() == KernelMode, there’s a loop that will run out of kernel stack. I suppose I have to be careful not to give away *all* of the bugs, though!)

I would be interested to see how you would write the call to ZwCreateSection, though. You caught some of the pitfalls with switching from NtCreateSection to ZwCreateSection, but there’s still some additional problems remaining depending on how you do that.

RE: ObDereferenceObject( SectionObject ), I think that those are okay in terms of all reference/dereference operations being matched. The assumption for SaveSectionFileInfoInList / SaveImageSectionObjectInList / DeleteImageSectionObjectInList is that these routines take their own explicit references on the PFILE_OBJECT/PSECTION_OBJECT passed in. (Double checked, and I don’t think that I missed any – apologies if I did! That being said, there may or may not be other bugs in existance here relating to object references.)

RE: the case of FileHandle == NULL: Yup, the hook doesn’t understand the semantics of pagefile-backed sections and will break the system service for anyone using one of those.

RE: KeEnterCriticalRegion: I could go either way on this one. Any locks acquired by Save*InList would typically disable APCs – you have to go out of your way to acquire a lock and *not* turn off APCs in kernel mode, generally. Without KeEnterCriticalRegion, the driver has to tolerate an indefinite delay between the image section list insertion and the other insertion path that always runs. However, it would be bad to call KeEnterCriticalRegion across a call to any system service in the hook, and simply dropping the critical region “in the middle” is really no different than making sure that Save*InList disable APCs when they take their locks properly. Tricky.

RE: Section size extension: The driver could have attempted to catch this with an NtExtendSection hook. However, that may introduce additional problems (requiring synchronization, and furthermore, an NtExtendSection hook wouldn’t catch callers of MmExtendSection. In general, though, it’s not particularly safe the way that the hook handles the section size – though there are other bugs in the size handling aside from just the section extensions case being broken.

Brad: Yep, all of those are true. (The races can happen even on a single processor system as this all transpires at PASSIVE_LEVEL and thus could be interrupted by a thread time slice expiration, but it’s a lot less likely. Still a very real bug, nonetheless.)

How would you fix the handle leak? (Be warned: doing so safely is not as obvious as it looks.)

I thought a lot about this kinda of issues lately. As you said, making a good hook on SSDT function is really tricky. In the case you presented its even worse because race condition allows multiple scenarios.

As it was said, you can replace SectionHandle pointer value by anything you want once it has been set by RealNtCreateSection. You can also modified it between the two call to ObReferenceObjectByHandle function. The ZwQuerySection function can deal with another object. Then the SaveImageSectionObjectInList could store an object pointer which does not correspond to linked image section.

Also the ObReferenceObjectByHandle function calls are not safe:

1) It is called with KernelMode argument which means that it can reference a kernel handle. An attack could be to reference a kernel handle and later control action done by the driver on it. This is not safe behavior and it is done a lot by drivers because they think it is related to argument checking (which is not). The consequences are difficult to say, it depends on what is done with the object.

2) It does not specified handle type. Some third vendors driver modified
or read data, pointers and different elements directly in the kernel object.
If the kernel object is not a section but something else, it could crash or create local privilege escalation condition. (For example if it follows a pointer without checking it and this part of memory is NULL on another object type …, Or go after object real size …).

As section handle pointer, SectionSize pointer value can be changed in the userland (race condition). It will corrupt saved section size.

For this function (NtCreateFile), the return value should be either STATUS_SUCCESS or something bad. It is not a good idea to return STATUS_SUCCESS if NT_SUCCESS match because on other functions you can have other success values. On this particular case it is okay as doc said:

“ZwCreateFile returns STATUS_SUCCESS on success or an appropriate NTSTATUS error code on failure.”

So yep, there is a lot of mistakes but it is nothing compared to what A.V drivers done on the SSDT. It is hard to blame them because I don’t have much solution against this type of mistakes.

Some drivers wanna filter handles and on section it can be really difficult. For exemple, you can create symbolic link (NtCreateSymbolicLinkObject) and then filtering named object become hard. The real function call must be done after any block attempt. If you call the real function by anyways on UserMode (THREAD PreviousMode) you are screwed. But it is hard because some pointer can be changed after you check them (UNICODE_STRING Buffer field for example). There is the solution of handling the call yourself by calling Zw* and do something to not fall on infinite loop. It can be possible but really difficult. By doing that you will create handle for the kernel, now the question is, how make it available for userland ? Honestly I don’t know, I got some idea but nothing sure and most can create horrible bugs (:D).

On this specific case, I would hook ObCreateObject and ObInsertObject. They are exported by ntorkrnl and allow you to be call once object is created and added in the handle table. Then you look for MmSectionObjectType (variable exported by ntorksnl). This variable type is OBJECT_TYPE. The object type header also allow you to replace some event pointer specific for this type.

Beware of modification across operating system version. Developer should also check when it is called and really know what he is doing because this could leads to very strange bugs. At least you’re dealing with object components, it can fit your needs or not depending of what you wanna do.

Another solution would be to hook exported MmCreateSection function. Because it is called by NtCreateSection just after arguments validation. This way, you deal only with validated input cached on kernels tack, but as you are dealing with undocumented stuff … It can be dangerous too.

One of the main issue of Windows driver development is that most of the time you must choose between reliability and security. Or you make something speed and secure (ACL, object manipulation, smart hooks) or something that would work on most versions even those you don’t know yet (and could lead to privilege escalation).

mxatone: The ObReferenceObjectByHandle calls are definitely not safe, for both of the reasons that you specify. You’re right in that the handle value could be changed after NtCreateSection returns. Let me ask you this, however: Are you safe if you abort out of the rest of the system service hook if ZwQuerySection returns STATUS_INVALID_HANDLE? (This is after the handle is captured from the user pointer.)

The hook here is NtCreateSection and not NtCreateFile. I am not certain that it’s safe to always force all return statuses to STATUS_SUCCESS for non-error returns on NtCreateFile (I’d be willing to bet that you’ll introduce a bug somewhere if you do that, documentation nonwithstanding), but there is definitely a specific lurking bug related to return status codes with the NtCreateSection hook as it’s written above.

Hum I made some issue on my description, as you said NtCreateSection and not NtCreateFile.

Skywing: Using ZwQuerySection function in this way is definitely not safe as it tolerates returning an error. It should check against STATUS_SECTION_NOT_IMAGE, if another error is reported just return an error.

NtCreateSection can also return all sorts of nice statuses like STATUS_IMAGE_NOT_AT_BASE for SEC_IMAGE calls that result in relocations.

These would (being of informational type) pass the NT_SUCCESS check but then immediately be lost as Status is overwritten by GetExceptionCode(), not to mention that the “OK” exit path always returns STATUS_SUCCESS.

What I am seeing
-you are overwriting status with about everything that’s not relevant to the call. That will report failure to the system and leaves you with a valid created section which will never get closed.
-you call ObReferenceObjectByHandle on the section handle twice, the first time assuming you receive the section object, the second time assuming you will receive a related file object
-When calling ObReferenceObjectByHandle you may be accessing a kernel handle table from a user process
-you call QueryAllocationLengthFromFileObject assuming that the section object is a file object
etc.

Even though FileHandle is passed by value, its use is still unsafe because the calling process can close the handle from another thread (and potentially allocate another handle with the same value) after RealNtCreateSection returns but before the hook references FileHandle.

I am sorry I did not see this earlier. Most bugs here seem related to not properly capturing all required information before doing the work (and doing the work *only* with the captured data). For handles you can duplicate them, same for tokens and variables must be deep copied.

That is just for starters, in most real world cases the time of check vs time of use (ToCToU) issues are very hard, specially in the case of files.

I had the same problems implementing hooks for a security solution.
There is also some overhead calling ObReferenceObjectByHandle and its similars almost every 100 miliseconds (when hooking all file requests). I removed all my hooks and made an inline hook at ObReferenceObjectByHandle which when the function starts i call KeRaiseIrqlToDpcLevel() do my code and then return to KeLowerIrql(oldIrql) at the function end or before calling functions which must be ran at lower IRQL. at first glance it may sound crazy to hook this function, as many of the kernel functions call it, but with a good switch it can be a good place to work with files, processes and other objects :)

“I would be interested to see how you would write the call to ZwCreateSection, though”
When you hook, get the previous address of NtCreateSection (this themselves is tricky, you have to use InterlockedCompareExchange to do it properly) and call that.