Background

It's often useful to modify the behavior of an application at runtime when access to the source code is not available. To do this people have traditionally relied on libraries such as Microsoft Detours, Minhook, and a few others. Each of these libraries has significant drawbacks however. Detours is x86 only unless a 'Professional' liscense is used, but that costs USD $10000 and even then the pro version maps a dll into a new code section making the application very bloated. Minhook is pretty good but i relies on pre crafted trampoline routines, sometimes fails to hook, and the source code is again bloated. To me there was only one real solution, write my own library, on my own terms, with the goal of being the smallest, cleanest, easiest hooking library in existance!

Features

PolyHook exposes 6 seperate ways to hook a function (all of them are x86/x64 compatible). Every method exposed has the same interface, Setup(), Hook(), and Unhook() methods. I'll describe what each hooking method does, how it works, when to use it, and provide a code example in the following sections. It *should* be thread-safe, although i may have missed something.

1) Standard Detour

The is the pseudo-standard way to hook a function, this is what both microsoft detours, and minhook implement. It works by writing a JMP assembly instruction to the prologue of a function that redirects the code flow to a custom handler. In x86 mode the instruction used is:

0x00000000 0xE9DEADBEEF JMP 0xDEADBEF4 (EIP+DEADBEEF)

This is a 32 bit relative instruction, meaning where it goes to jumps to is dependant on where the instruction itself is located in memory. In my example the instruction is located at the location 0x00000000, the offset is 0xDEADBEEF, and then you include the size of the instruction with is 5 bytes (E9 +DE+AD+BE+EF) to calculate its final location which is 0xDEADBEF4.

In x64 it's a bit more complicated because there is no single instruction that can jump the entire x64 address range. So instead I use two different assembly snippets, and choose which to use based on the size of the prologue:

0xFF25DEADBEEF JMP [DEADBEF4] ([RIP+DEADBEEF]) //6 bytes total

or when the prologue is greater than 6 bytes in size:

push rax
mov rax, 0xDEADBEEFDEADBEEF
xchg qword ptr ss:[rsp], rax
ret

The first snippet is special because it actually jumps to the location pointed TO by (RIP+DEADBEEF), it DOES NOT go to (RIP+DEADBEEF). In my implementation i write this jump to point to the end of a trampoline that i allocate within +-2GB of the prologue, then at this location i write the memory location of the handler which is where the jump actually goes to. You might be wondering why the trampoline is allocated within +-2GB and that's because the instruction can only encode an offset up to 32bits in size, the instruction is 6 bytes in size, -2 bytes for the 0xFF, 0x25, leaves us with 4 bytes to write in a displacement value.

The second jump type is preffered as the trampoline can be allocated anywhere in the entire x64 address range. It works by saving the value of the rax register on the stack, moving a full x64 adress into rax, then switching the stack to hold the value of rax, and restoring rax to the original saved on the stack, then ret-ing which just jmps to the first value on the stack, effectively doing a jump!

There are alot of other VERY important nitty gritty details to correctly implementing detours but i'll explain those later in the section title "Detours Tricky Bits" for breivety. The rest of the logic is very simple, the jump we wrote takes us to our handler and we do what we want in our handler, then we return back to the original function by first executing what's know as a trampoline which simply executes the bytes we overwrote with the jump we wrote in earlier, then the trampoline jmps back to the memory location directly after the jmp we placed in the prologue. Complete code sample:

4) Virtual Function Pointer swap

This is a simplification of the above. Instead of copying the vtable we instead just change the value of the virtual function inside the original vtable to point to our handler. This method is easier to detect than a VTABLE swap but it's also much simpler.

5) Import Address Table Hook

When any API is called in a C or C++ program on windows the location of that API is placed into a table called the IMPORT_ADDRESS_TABLE. At compile time this is simply a table of API names, at runtime the windows loader finds the memory location of the API's and writes them into an identical table next to the name table, all further calls to any API is first looked up in this table and then the function pointer in the table is called. This hook method swaps the pointer value in this table to point to our own handler so that when the target calls the API our handler is instead called. The IAT is advanced and better writeups than my own exist: http://sandsprite.com/CodeStuff/Understanding_imports.html

6) Vectored Exception Handler Hooks

The final hooking method is a neat one, and one of the stealthiest. By generating an exception we can trap into an exception handler. Inside of that exception handler we can then change the value of RIP/EIP (the instruction ponter) to the location of our hook handler. Once we remove our exception generating method and return EXCEPTION_CONTINUE_EXECUTION from the exception handler our hook handler will begin executing and we have effectively performed a jump to our handler. The exception generating methods can be either a hardware breakpoint, software breakpoint, or guard page

There is an interesting quirk however. In order to execute our original function we have to remove the exception generating mechanism to avoid calling our exception handler again. This leaves us stuck in figuring out how to restore the exception generating mechanism after we are done executing the original, as we want to be able to intercept the function more than once!

The solution is simple is it turns out, C++ destructors! Since destructores of an object are guaranteed to be executed AFTER we leave the scope of an object we can leverage them to have the compiler automatically place a stub that restores the protection after we are done executing the original in our handler! This code example from stackoverflow will execute a std::function on object destruction:

Detours Tricky Bits

Instruction Splitting:

When writing our jump instruction into the prologue we have to make sure we do not split any instructions. In x86/x64 instructions are always of a fixed size, so the jmp we use is always 5 bytes in size. If we were to for example attempt to hook the following function we would generate an exception (just an example prologue, this would never be found in real compiler generated code):

As you can see the 0xD1 byte would be left over, creating junk code which would and most likely will cause undefined behavior down the line, leading to hard to find crashes.

The solution is to use a disassembly engine to make sure we never split instructions. I use capstone for the polyhook project as it's well maintained and very powerful. Internally i measure the size of each assembly instruction i am forced to overwrite and then nop out any extra bytes of that instruction, so we would be left with:

0xE9DEADBEEF JMP 0xDEADBEF4
0x90 NOP

The 0xD1 byte is overwritten with a NOP and we have fixed our problem.

Code relocation:

As mentioned before some instructions are relative to their location in memory. When we write the jump instruction into the function prologue we are actually overwritting the code that was there before. So what we do is we first copy the original code into our trampoline so that it gets executed when it's time to call the original again. But this trampoline is in a different memory location than the prologue and it's possible some instruction could be of the relative types and thus moving them would change their meaning! An real world example of this is the messagebox prologue:

The first relocation is really easy to fix, we simply re-calculate the displacement required to get it to point to the original location. It originally points to 0x7FFC0EC2E198+0x2B77A = 0x7FFC0EC59912. The trampoline with the first relocation fixed is:

0x7FFC0EB60008+0xf990a = 0x7FFC0EC59912, as you can see we changed the instruction address while preserving what it meant!

The second relocation however is hard. The je instruction only has a single byte to encode it's displacement, meaning we can only move it by up to 0xFF or 255 bytes (0x74 means je,then next byte is the displacement). The difference between our function prologue and our trampoline is way bigger than that however 7FFC0EC2E190 - 7FFC0EB60000 = CE190. So we have to instead redirect the je to an absolute x64 jmp (credit to minhook for this). The fixed trampoline then becomes:

From this we see that when the je path is taken it redirects it to the fancy jmp i showed earlier.

After Thoughts

All of the hooking methods require you create a typedef for the function, if you get a crash it's more likely that you have an incorrect typedef than a bug in the library, please post in the comments your issue and i'll look at it. If you have any suggestions just message me and i'll take a look!

As a bonus you can use decltype to define the typedef for API's and such, an example for messagebox:

Comments and Discussions

In 50% cases i got crash if try hook function already hooked by another code(another program).
Please add multi hook support. This is very easy and no necessary to ban users at github, this can be done without it.

For example If the Multihook support flag is set

Read first instruction
if instruction is one of JMP method (unconditional)
then jump at address and install hook there

which interface are you trying to hook? Com in c++ is implemented using virtual functions so you could use any of my virtual type hooking methods. Here's a link to how COM is laid out under the hood: here[^]. If you post your COM code i can write some demo code for you.