Thursday, July 26, 2012

The idea started as an attempt to implement "System Calls Hooking" from user-mode under Wow64 processes (32-bit processes running on64-bit versions of Windows). I later extended it to include native32-bit processes. The whole thing ended up as an OllyDbg plugin which you may find useful for many purposes e.g. malware analysis and unpacking.

Now let's quickly see how 32-bit code issues system calls in Wow64 processes and in native 32-bit processes.

1) In Wow64 Processes:
If we take the "ZwOpenProcess" function of the 64-bit version of Windows 7, we can see that EAX holds 0x23, the system call ordinal and EDX points at the stack arguments.

If we move to this address, we see a FAR jump. It jumps to 0x74DE271E setting the code segment to 0x33 and this is where 32-bit debuggers can't go any further.

So, we can now conclude that by changing the CS, code segment to 0x33, transition from 32-bit mode to 64-bit mode occurs and this is where system calls are taken care of.

0x74DE271E does not exist in the "Executable modules" list that you see if you press ALT+E in OllyDbg. But if we try to dump the memory that this address belongs to, we can see that the module name is wow64cpu.dll.

N.B. wow64cpu.dll is a 64-bit dynamic link library that resides in the "system32" directory and is always loaded into the address space of Wow64 processes along with other 64-bit libraries. The other 64-bit libraries are wow64.dll, wow64win.dll, and the 64-bit version of ntdll.dll. They are hidden by depriving them of having entries into the doubly linked lists of PEB.LoaderData (i here refer to the 32-bit PEB, we will see this later).

N.B. If we apply symbols to wow64cpu.dll, we find that 0x74DE2320 is the address of the non-exported symbol of "X86SwitchTo64BitMode" and 0x74DE271E is the address of the non-exported symbol of "CpupReturnFromSimulatedCode".

2) In Native 32-bit Processes:

The image above is the "ZwOpenProcess" function of Windows XP SP3. EAX is 0x7A, the system call ordinal and EDX points at 0x7FFE0300 in the _KUSER_SHARED_DATApage. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall)

Actually, depending on the underlying processor architecture, that CALL instruction may jump to the "KiIntSystemCall" function. In this post, i will just focus on the "KiFastSystemCall" function.

Looking at the "KiFastSystemCall" function, we can see it is as simple as pointing EDX at the stack and issuing SYSENTER to enter kernel-mode. Then comes a RET instruction which represents the "KiFastSystemCallRet" function.

Now,
let's see how we can implement a user-mode system calls hook in Windows
7 64-bit (Wow64 processes).

1) Wow64 processes

To implement a hook, the first method one may think of is replacing the address stored at FS:[0xC0] (0x74DE2320, as seen above) with the address of our own hooking code. While
this seems to be very easy, it has one drawback, that is, this field is per-thread i.e. we have to keep track of all new threads and
for each new thread, we have to replace the address at FS:[0xC0] with the address to our own hooking code.

Imagine the scenario where we CreateProcess our target process in suspended state, overwrite the address stored at FS:[0xC0], and finally ResumeThread.
In this scenario, we can't keep track of any new tread created after we
call the "ResumeThread" function and hence all its system calls will be
lost.

Imagine the second scenario where we call the "CreateProcess" function on our target process with the "dwCreationFlags" parameter set to DEBUG_ONLY_THIS_PROCESSa.k.a we are debugging our target process. In this scenario we can see all new threads as we intercept the "CREATE_THREAD_DEBUG_EVENT" events. Once we receive the "CREATE_THREAD_DEBUG_EVENT" event, FS:[0xC0] should contain the address of the FAR jump, but this does not always occur. To explore this fact, let's use the 64bit version of Debugging Tools for Windows to debug a demo 32-bit executable that does nothing but creates a new thread.

We instruct WinDbg to break on new threads and then place a software breakpoint on the "Wow64cpu!CpuThreadInit" function, the function responsible for storing the address of the FAR jump into FS:[0xC0].

After
repeating the abovementioned step few times, you can see that the
"Wow64cpu!CpuThreadInit" function does not always precede the thread
entry point.

Now
we have seen that overwriting the pointer at FS:[0xC0] is not the
best way to implement the user-mode system call hook.

Let's try the
second method. Actually it is the one i prefer. By overwriting the FAR
jump instruction itself in wow64cpu.dll, we can get
rid of the new threads' annoyance. All we have to to in this method is
set the proper memory protection of the wow64cpu.dll page that
contains the FAR jump, write a near JMP instruction into your hook code, and
finally restore the original memory protection. This method has been
implemented in my open source OllyDbg plugin. A link to the source code is found at the end of this blog post.

One
more method would be manipulating the
"Wow64cpu!CpuThreadInit" function to force it to store the address of our own code at offset 0xC0 instead of storing the address of the FAR jump.

Side notes:
1) As you can see in the "Wow64cpu!CpuThreadInit" function code, each Wow64 thread has two TEB's, 32bit TEB and 64bit TEB. The 64bit TEB always precedes the 32bit TEB by two pages.

2) I have also noticed that 32bit PEB always precedes the 64bit
PEB by one page. So, in a single-threaded application, the sequence is
64bit TEB-->32bit TEB--->32bit PEB --> 64bit PEB.

3) Wow64 processes, at their startup, always raise a special exception called "STATUS_WX86_BREAKPOINT" with exception code 0x4000001f. This is something that 64bit debuggers are supposed to be aware of.

4) I have also noticed that Wow64 threads seem to have two stacks, 64bit stack and 32bit one.

In later posts, i will show how we implement the user-mode system call in Windows XP SP3. Don't worry it is even easier.

Let's see how we can implement the system call hook for OllyDbg v1.10.

First, i designed the hook into two DLLs. The first is the OllyDbg plugin or the injector DLL, i named it InjectHookLib.dll. The second is the injected DLL which has your own code for logging or manipulating system calls. I will show you the steps i have taken to write InjectHookLib.dll. I will also show you how to write a simple library to inject.

1) Injector DLL
Once you choose to inject a library, a common dialog box is opened for you to choose the library. One memory page is allocated into the address space of the target process (the debuggee) and a few x86 instructions are copied into it.

In the image above you can see the code cave copied into the target process address space.

If it is the first system call issued by the target process, this code cave injects the library you chose into the address space. After the library has successfully been injected, the code then jumps to its "DllMain" function where you can manipulate intercepted system calls.

One
difficulty that i met was filtering the calls originating from the "LoadLibraryA" function and from inside the "DllMain" function. That was overcome by having global variables which are
to be checked upon any call to the "DllMain" function.

Calling the "DllMain" function of the injected library, the "fdwReason" is always set to 0x4 to tell the "DllMain" function that a system call is being passed and "lpvReserved" is made to point to the stack where the registers are saved (those registers are the ones of PUSHAD and PUSHFD).

Now let's see how we write the "DllMain" function of the library to be injected. I will take my dumpSysCalls.dll to be my first example. More examples will be released soon.

I will rename the third parameter from void* lpvReserved to MyContext* pContext so that the "DllMain" function prototype looks like below.

If the "fdwReason" parameter is DLL_PROCESS_ATTACH, i recommend you to call the "DisableThreadLibraryCalls" function.

The "MyContext" structure as its name implies has all registers passed via the code cave mentioned above and its definition looks like below.

If the "fdwReason" parameter is 0x4, this means that a system call is being passed and we should start playing with it. Given the "pContext" pointer and the platform-specific info. discussed earlier in the post, we can easily play with system calls. For example, in Windows 7 64-bit, the Eax (pContext->Eax) always holds the system call ordinal. We can look up this ordinal to determine the system call string. According to the system call ordinal we can use (pContext->Esp) to get the return address and the system call arguments. See the image below.

N.B. The library entry point must be "DllMain" (not "DllMainCRTStartup"). This is accomplished by ignoring all default libraries and setting the "/entry" to "DllMain".