Introduction

I wrote this little hook-engine for a much bigger article. Sometimes it seems such a waste to write valuable code for large articles whose topic isn't directly related to the code. This often leads to the problem that the code won't be found by the people who are looking for it.

Personally, I would've used Microsoft's Detour hook engine, but the free license only applies to x86 applications, and that seemed a little bit too restrictive to me. So, I decided to write my own engine in order to support x64 as well. I've never downloaded Detour nor have I ever seen its APIs, but from the general overview given by Microsoft, it's easy to guess how it works.

As I said, this is only a part of something bigger. It's not perfect, but it can easily become such. Since this is not a beginner's guide about hooking, I assume that the reader already possesses the necessary knowledge to understand the material. If you never heard about this subject, you'd better start with another article. There's plenty of guides out there, no need to repeat the same things here.

As everybody knows, there's only one easy and secure way to hook a Win32 API: to put an unconditional jump at the beginning of the code to redirect it to the hooked function. And by secure I just mean that our hook can't be bypassed. Of course, there are some other ways, but they're either complicated or insane or both. A proxy DLL, for instance, might work in some cases, but it's rather insane for system DLLs. Overwriting the IAT is unsecure for two reasons:

The program might use GetProcAddress to retrieve the address of an API (and in that case we should handle this API as well).

It's not always possible, there are many cases as for packed programs where the IAT gets built by the protection code and not by the Windows loader.

Ok, I guess you're convinced. Let's just say that there's a reason why Microsoft uses the method presented in this article.

How It Works

A common technique used in combination with the unconditional jump is:

This approach may seem unsafe in a multi-threading environment and it is. It might work, but our technique is much more powerful. Well, nothing new, we just put our unconditional jump at the beginning of the code we want to hook and we put the original instructions of the API elsewhere in memory. When the hooked function jumps to our code we can call the bridge we created, which, after the first instructions, will jump to the API code which follows our unconditional jump:

Let's make a real world example. If the first instructions of the function/API we want to hook are:

mov edi, edi
push ebp
mov ebp, esp
xor ecx, ecx

They will be replaced by our:

00400000 jmp our_code
00400005 xor ecx, ecx

Our bride will look like this:

mov edi, edi
push ebp
mov ebp, esp
jmp 00400005

Of course, to know the size of the instructions we're going to replace, we need a disassembler both for x86 and x64. I searched on Google for an x64 disassembler and found the diStorm64 disassembler. I quote from its homepage:

diStorm64 is a professional quality open source disassembler library for AMD64, licensed under the BSD license.

diStorm is a binary stream disassembler. It's capable of disassembling 80x86 instructions in 64 bits (AMD64, X86-64) and both in 16 and 32 bits. In addition, it disassembles FPU, MMX, SSE, SSE2, SSE3, SSSE3, SSE4, 3DNow! (w/ extensions), new x86-64 instruction sets, VMX, and AMD's SVM! diStorm was written to decode quickly every instruction as accurately as possible. Robust decoding, while taking special care for valid or unused prefixes, is what makes this disassembler powerful, especially for research. Another benefit that might come in handy is that the module was written as multi-threaded, which means you could disassemble several streams or more simultaneously.For rapidly use, diStorm is compiled for Python and is easily used in C as well. diStorm was originally written under Windows and ported later to Linux and Mac. The source code is portable and platform independent (supports both little and big endianity).It also can be used as a ring0 disassembler (tested as a kernel driver using the DDK under Windows)!

This sounded pretty good to me. Now that we have our disassembler, we can start!

The first thing I wanted to know was if it was possible to create bridges without having to relocate jumps. As the reader knows jumps, most of the time, have a relative address as operand and not an absolute one. This leads to the problem that I can't relocate a jump without having to recalculate its relative address. Also, I wanted to test if this disassembler really worked fine. So, I wrote a little program which creates a log file of all the instructions of all exported functions in a DLL which are going to be overwritten by an unconditional jump. Here's the code:

The command line syntax is: pefile logfile (e.g. disasmtest ntdll.dll ntdll.log). As you can see, I took 10 bytes for x86 hooks. It's possible to use 5 bytes jumps on x86/x64, but it's necessary to check that there's less than 2 GB between the original function and our code and between the bridge and the original function. Well, we have to check that on x86 as well, but it is very likely. The worst case scenario either for x86 and x64 is this absolute jump:

jmp [xxxxx]
xxxxx: absolute address (DWORD on x86 and QWORD on x64)

This means we'd have a worst case scenario of 10 bytes on x86 and of 14 bytes on x64. In this hook engine, I'm using only worst case scenarios (no 5 byte relative addresses), simply because if the space between the original function and the hooked one is > 2 GB or the space between the original function and the bridge is > 2 GB, then I would have to recreate the bridge from scratch every time I hook/unhook the function. A professional engine should do this (and it's not much work), but I'll keep it simple (for me) and use only absolute jumps. As for the results of the little program above, I created logs for the ntdll.dll and advapi32.dll both for x86 and x64. Here, for instance, is a small part of the ntdll.dll x86 log:

But what about the functions which just call a syscall after moving a number into a register like NtCreateProcess, NtOpenKey etc.? These functions have very few instructions and our 14 bytes jump will overwrite more code than the one of the function itself. But that doesn't seem to be a problem, since as we can see from the disassembler these functions have a 16-bytes alignment. So, we won't overwrite other functions code anyway.

Here's the main code of the hook engine (all the code is about 300 lines of code):

I implemented it as a DLL (but you can include it in your code as well).

Using the Code

Using the code is very simple. Basically, the DLL only exports three functions: one to hook, another to unhook and the last to get the address of the bridge of the hooked function. Of course, we need to retrieve the address of the bridge, otherwise we can't call the original code of the hooked function.

In this sample I'm hooking the API MessageBoxTimeoutW. I tried to hook MessageBoxW and that worked fine on x86, then I tried on x64 and the code generated an exception. So, I disassembled the MessageBoxW function on x64:

Unfortunately, as you can notice, the first instructions of this API include a jz which is going to be overwritten by our unconditional jump. And since we don't relocate jumps in our bridge, we can't hook this function. So, I had to hook the function MessageBoxTimeoutW, which is called inside MessageBoxW and has no jumps at the beginning.

In the code example I first hook the function and call it, then I unhook it and call it again. So, the output will be:

That's all. Of course, this code works only if MessageBoxTimeoutW is available. I'm not completely sure about when it was first introduced, since it's an undocumented API. I guess it has been introduced with XP, so chances are that this particular hook won't work on Windows 2000.

Conclusions

As it's possible to see from the previous example, the hook engine isn't perfect, but it can easily be improved. I don't develop it further because I don't need a more powerful one (right now, I mean). I just needed an x86/x64 hook engine with no license restrictions. I wrote this engine and the article in just one day, it really wasn't much work. Most of the work in such a hook engine is writing the disassembler, which I didn't do. So, in my opinion, it doesn't make much sense paying for a hook engine. The only thing which I really can't provide in this engine is support for Itanium. That's because I don't have a disassembler for this platform. But I would rather write one myself than buy a hook engine. I might actually add an Itanium disassembler in the future, who knows...

Share

About the Author

The languages I know best are: C, C++, C#, Assembly (x86, x64, ARM), MSIL, Python, Lua. The environments I frequently use are: Qt, Win32, MFC, .NET, WDK. I'm a developer and a reverse engineer and I like playing around with internals.