Pavel's Blog

September 5, 2017

There are several techniques available for injecting a DLL into a process. Each has its own strengths and drawbacks.
The simplest one uses the CreateRemoteThread function to create a new thread in the target process and point the thread’s start function to the the LoadLibrary function, as LoadLibray and a thread’s starting function have the same prototype from a binary perspective (both accept a pointer).

This method is the easiest to use but also the most visible. Creating a new thread can be "noticed" in several ways, such as with an ETW event. If a driver is present and is hooking thread creation with PsSetCreateThreadNotifyRoutine, it will naturally be notified.

A stealthier technique is using an existing thread to do the deed. One way to go about it is using a APC to attach to a victim thread in the target process by calling QueueUserApc and again pointing the APC at LoadLibrary. The potential issue with APCs is that the thread must enter an alertable state to actually "process" the APC and execute our LoadLibrary call. Unfortunately, there is no guarantee that a thread will ever put itself in an alertable state. The caller can try all other threads in the process, but in some cases that will not work. A canonical example is cmd.exe, where its single thread never enters an alertable state, as far as I can tell.

This post is about yet another way to make a target process call LoadLibrary, but this time by manipulating the context of an existing thread, without it "knowing" about it. The thread’s instruction pointer is diverted to a custom piece of code and then redirected back. This method is very difficult to detect, since it’s just a thread doing work.
Let’s see how to accomplish something like this in both x86 and x64.

The first thing we would need is to locate the target process and a selected thread within that process. Technically, it can be any thread from the target process, but a waiting thread will not run our code unless it’s ready to run, so it’s better to select a thread that is running or likely to run to get the DLL loaded as early as possible.

Once we set our sights on a target process and one of its threads, we need to open them with appropriate access:

We need the PROCESS_VM_OPERATION and PROCESS_VM_WRITE for the process because we’re going to write the target code inside the process. For the thread – we must be able to change its context and for that we must suspend it while we do that.

The injection itself requires several steps. We start by allocating memory in the target process with the execute protection included, since our code would live there:

We allocate one page of RWX memory. We don’t actually need that much, but the memory manager works in pages anyway, so we might as well explicitly allocate a complete page.

What kind of code should we place in the target process? Clearly, we want to call LoadLibrary, but it’s much trickier than that. We need to call LoadLibrary and then resume execution where the thread left off. So first we suspend the thread and then capture its execution context:

Next, we need some code to copy to the target process. This code must be crafted in assembly, and must match the "bitness" of the target process (in any case the to-be-injected DLL must match the target process bitness). For x86, we can write the following in Visual Studio and copy the resulting machine language bytes:

The function is decorated with the __declspec(naked) attribute which tells the compiler not to emit the usual prolog/epilogue instructions – we want the pure code. The weird numbers in the code are placeholders we need to fix before we copy the code to the target process.

In the source code for this demo I packaged the resulting machine code into a byte array like so:

First, we get the address of LoadLibraryA, since that’s the function we’ll use to load the DLL in the target address. LoadLibraryW would work just as well, but the ASCII version is a bit simpler to work with. The address of the DLL path is set to be 2KB into the buffer, which is quite arbitrary.

Next we write the modified code and the DLL path to the taget process:

Debugging this kind of scenario is non-trivial, since we need to attach the target process and follow the code from there. In the following example, I launched the 32 bit version of notepad from \Windows\SysWow64 directory (on a 64-bit system). The command line of the demo project allows setting the target process ID and the path to the DLL to inject. I’ve set that up with Visual Studio and placed a breakpoint just before the call to SetThreadContext. The console window shows the virtual address into which the code was copied to:

Now we can attach WinDbg to the notepad process and look at the code at that address:

We can clearly see our modified code, where LoadLibraryA is called and then the code resumes somewhere inside NtUserGetMessage, which is quite expected for a message pump. e can even set a breakpoint right there:

bp 04A00000

Now we can let notepad go and then the injecting process. And sure enough, we hit the breakpoint. Here’s the breakpoint and call stack:

I won’t go into the details, but it looks different from the x86 version because the calling convention in x64 is different that x86 __stdcall. For example, the first four integer arguments are passed in RCX, RDX, R8 and R9 rather than on the stack. In our case, RCX is enough as LoadLibraryA only takes a single argument.

So there you have it – a DLL injected with an existing thread by changing its context. This method is difficult to detect, as loading a DLL is not an unusual event. One possible way would be to locate executable pages and compare their addresses to known modules in the process. The injecting process can, however, after injection is complete (which could be signaled by some event object, for instance) deallocate the injected function’s memory, so there is only a small window of opportunity to "notice" the executable page.