Introduction

After publishing my last article ([2]) explaining how to emulate some missing Windows functions
used for remote code execution, the next logical step was to use these functions as a framework for implementing a library that allows easy remote code injection.
Remote code injection is the method that permits executing code within the address space of a process other that the current one. Because the architecture of Windows
isolates each process to protect them against memory overwrites and other bugs in applications, injecting code into a remote process is not straightforward.
This library implements functions that allow direct remote code injection, DLL remote injection and remote subclassing for Win32 processes (GUI and CUI)
and NT native processes. Don't expect to find any innovative code as this library is mainly based on the techniques described by Robert Kuster in his article
"Three ways to inject your code into another process" ([1]). Nevertheless I hope that you'll find the library
useful and use it in your projects.

Remote SEH (Structured Exception Handling)

All the remote code execution is protected by SEH to avoid any exception to crash the remote process. The SEH code you normally find in a C/C++ application looks like the following:

__try
{
// try code
}
__except(filter-expression)
{
// except code
}

You cannot use this code in remote code because this is the compiler implementation of SEH and internally it calls the standard library
functions (__except_handler3) that reside on the current process. You need to use system-level SEH ([6]).
System-level SEH is implemented as a per-thread linked list of callback exception handler functions. A pointer to the beginning of this list can be retrieved
from the first DWORD of the TIB (Thread Information Block). The FS segment register always points to the current TIB.
To implement SEH all that is needed is to add an exception handler to the linked list. In the simplest form this can be accomplished with the following code:

Every time an exception in the try code block occurs, the operating system calls the _exception_handler routine.
In the simplest form, only two DWORDs (which make up an EXCEPTION_REGISTRATION structure) must be pushed on the stack.
Of course nothing prevents us from adding additional data fields to this structure (VC, for example, pushes an extended EXCEPTION_REGISTRATION structure
containing five fields). In my implementation, I'm adding two fields to the standard SEH frame: the value of the EBP register
and the address where the execution should resume after the exception occurs. The final code will look like this (You'll notice that the code is written in assembly.
I used assembly for two reasons: assembly permits a greater control of the generated code and only in assembly is it possible to access the FS register):

OS family

This information is necessary because the injection algorithms are different for the Windows 9x (95, 98, Me) and NT (3, 4, 2000, XP, Vista, 7) families.
The information is returned directly by a call to GetVersionEx():

Process is not initialized

NT

If the LdrData or LoaderLock fields of the PEB are NULL then the NT process is not initialized.
Both fields are set by the NT loader user-mode APC routine LdrpInitialize() while initializing the process.

fNOTINITIALIZED = (PEB.LdrData == NULL || PEB.LoaderLock == NULL);

9x

Only if the last DWORD of the main thread stack is below 2GB (0x80000000) the Win9x process is initialized ([3]).

Protected process

Starting with Windows Vista a new type of process, called a protected process, is introduced.
In a protected process the following operations cannot be performed: inject a thread, access the virtual memory,
debug the process, duplicate a handle or change the quota or working set. Therefore remote injection it's not
possible in protected processes. Use the following code to detect a protected process:

Subsystem

This is the type of subsystem the process uses for its user interface. It's the same as the Subsystem field found in the PE Header of the file on disk (and of the Module Header
in memory).

NT

In NT the subsystem type can be directly retrieved from the PEBImageSubsystem field:

Subsystem = PEB.ImageSubsystem;

9x

The subsystem type can be retrieved from the module's header Subsystem field. To locate the module's header in memory we can use the Kernel32
GetModuleHandle() function or the MTEModTable. The pointer to the NT header is obtained from the pNTHdr field
of the IMTE (Internal Module Table Entry). The IMTE address is obtained from the MTEModTable using the PDBMTEIndex field as an index ([4] chapter 3 details all these structures and explains the hack needed
to obtain the address of the MTEModTable from the Kernel32 GDIReallyCares() function).

Check if the function code is safe to be relocated (no calls or absolute addressing) and calculate its length. Note that this is not 100% secure!
You should write relocatable code and analyze the generated code.

Allocate a remote memory block and copy the function code to it.

If a data block is specified, allocate a remote memory block and copy the data to it.

Allocate a remote memory block and copy the stub code to it (see file "Stub.asm"). The stub code will set an SEH frame and call the user thread function.
The special native process exit is also handled by this code.

According to the ProcessFlags it will run the remote code using one of the available methods: CreateRemoteThread(),
RtlCreateUserThread() or NtQueueApcThread().

Wait for remote code to finish using WaitForSingleObject(hThread) or check the Finished flag set by the stub code.

If a data block was specified, read back the data from the remote memory block.

Cleanup and return error code.

Depending on the ProcessFlags a different remote code execution method must be used:

Win32 initialized process

Use the CreateRemoteThread() function to execute the remote code (because this function doesn't exist in Win9x it must
be emulated (see [2])). Starting with Windows Vista CreateRemoteThread() will fail
if the target process is in a different session than the calling process. The solution to this limitation is to use the undocumented NtCreateThreadEx() function
on Windows Vista and 7 ([8]). Wait for the remote code to finish by calling WaitForSingleObject()
on the returned thread handle, and get the remote exit code by calling GetExitCodeThread().

Win32 non-initialized process

What you can do in a non-initialized process is very limited (because you cannot assume that the system internal structures are initialized, the DLLs are loaded, ...)
therefore you should be extremely careful while injecting code into this type of process. It's advised to wait until the process finishes its initialization.
For GUI processes, this can be accomplished by using the WaitForInputIdle() function, but unfortunately there's no equivalent function for the other
types of processes. Anther possible technique involves setting a breakpoint into the process entry point (this allows to detect when the system part of the process
initialization has terminated).

9x

Just set a bit in the CreateRemoteThread()dwCreationFlags parameter that causes this function internally to prevent the THREAD_ATTACH
message being sent before PROCESS_ATTACH (see [3]).

NT

The NtQueueApcThread() function is used to queue an APC routine (our remote code) on an existing remote thread. The APC routine will run as soon
as the thread becomes signaled. We cannot use wait functions on a thread for which the APC was queued and therefore to get the remote code exit status we poll
the Finished flag set by the remote stub code. We also cannot use GetExitCodeThread() to get the remote exit code (this will return
the "hijacked" thread exit status) so we always set the exit code to zero (of course we could save the exit status in a variable and read it later as we do with
the Finished flag).

NT native process

To create an NT native process the RtlCreateUserThread() function is used. The WaitForSingleObject() and
GetExitCodeThread() can be used on the returned thread handle. Note that the native remote code requires a different exit code.
This is handled by the remote stub code. The code used for the native exit is the Kernel32 ExitThread() equivalent but for native processes:

Call LdrShutdownThread() to notify all DLLs on thread exit.

Release the thread stack by calling NtFreeVirtualMemory(). Note that before releasing the stack we must switch to a temporary stack.
The UserReserved area within the TEB is used for this purpose.

Terminate the thread by calling NtTerminateThread().

InjectDll()

The InjectDll() function loads a DLL into the address space of a remote process. It accepts 5 parameters:

hProcess: Handle of the remote process.

ProcessFlags: Returned by GetProcessInfo(). Can be zero.

szDllPath: Path of the DLL to load. ANSI/Unicode strings can be passed to InjectDllA()/InjectDllW().

dwTimeout: Timeout in milliseconds used in wait functions. Can be INFINITE.

hRemoteDll: Pointer to an HINSTANCE variable that will receive the loaded DLL handle.

InjectDll() just initializes the data block needed by the remote code and use RemoteExecute() to remote execute
the function RemoteInjectDll().

RemoteInjectDll() will run in the address space of the remote process and calls LoadLibrary() to load the specified
DLL within the address space of the remote process. The handle of the loaded DLL is returned.

EjectDll()

The EjectDll() function unloads a DLL from the address space of a remote process. It accepts 5 parameters:

hProcess: Handle of the remote process.

ProcessFlags: Returned by GetProcessInfo(). Can be zero.

szDllPath: Path of the DLL to unload. ANSI/Unicode strings can be passed to EjectDllA()/EjectDllW(). Can be NULL.

hRemoteDll: If szDllPath is NULL the hRemoteDll parameter is used as the DLL handle.

dwTimeout: Timeout in milliseconds used in wait functions. Can be INFINITE.

EjectDll() initializes the data block needed by the remote function and use RemoteExecute() to remote execute
the function RemoteEjectDll().

RemoteEjectDll() will run in the address space of the remote process and calls FreeLibrary() to unload the specified DLL.
FreeLibrary() is called a number of times necessary to decrease the reference count to zero. If the DLL name is specified
GetModuleHandle() is used to retrieve the handle of the DLL needed by FreeLibrary().

If you need to pass extra data to the new window procedure handler, it must be appended to the existing RDATA. Before calling StartRemoteSubclass(),
the following fields of the RDATA structure must be initialized: Size must contain the size of the RDATA structure plus any appended data,
hProcess must contain the handle of the remote process, and hWnd must contain the handle of the window to be subclassed. The extra fields
of the appended data should also be initialized at this point. All the remaining fields should be considered private and not used.

Except for the first parameter (a pointer to the RDATA structure) the remaining parameters are the normal window handle, message type,
and wParam and lParam found in any window procedure handler. The new window procedure handler will be called by Windows every time
a message to the window must be processed, therefore the function should be coded as a "normal" window procedure handler (with the switch(Msg) loop).
Please note that because this function will be executed on a remote process, it must follow the same rules as any remote code execution. Any unhandled message
should be processed by the default window procedure handler. For this, the function must return FALSE. If you want to process yourself some messages,
return the value in the Result field of the RDATA structure and return TRUE for the function. This function
is protected from exceptions by a remote SEH frame.

RemoteStartSubclass() will run in the address space of the remote process and calls SetWindowLong() with the parameter
GWL_WNDPROC to change the window procedure handler to a new window handler. This handler will be called by Windows every time a message
to the window must be processed. The new window procedure handler (StubWndProc() of file "Stub.asm") sets an SEH frame and calls
UserWndProc(). If UserWndProc() returns FALSE a call to CallWindowProc() allows the original window procedure to handle the message.

StopRemoteSubclass()

The StopRemoteSubclass() function restores the remote process original window handler. It accepts one parameter:

rd: This is the same RDATA structure passed to StartRemoteSubclass() and contains the needed data initialized by this function.

RemoteStopSubclass() will run in the address space of the remote process and calls SetWindowLong() with parameter GWL_WNDPROC
to restore the original window procedure handler.

Demo

Finally to demonstrate how to use the Injection Library exported functions, I wrote an application that lets you use all the injection methods on any
running process (if applicable!). The application just fills a listview control with all running processes, and according to the user choices, injects code,
a DLL, or subclasses a process window. From my tests, only the following processes couldn't be injected:

"I found many people talking about RtlCreateUserThread(), well it can be implemented easily (it is in ntdll.dll), but has a flaw, if you inject a dll with this function you cannot use CreateThread() inside it but you need to implement RtlCreateUserThread() in the dll too; i don't know why but it is so."

Thanks very much for the well explained article and library; it was really of great help to me.

One thing that I ran into and thought others may want to be wary of during using the library. I was working on an application where I called RemoteExecute with a thread function not in my current application, but rather coming from a DLL loaded by my application. That DLL was produced by another project I created also, so I had full control and access to its code. However the thread function never returned the value I expected.

After a couple of sleepless nights and looking into the assembly produced for my thread function inside the DLL, I found that VC replaced a call to a function pointer(referencing a function in User32.dll) at the end of my thread function with a JMP instruction instead of the expected CALL(; my thread function was simply only a call to a function pointer passed inside the thread function parameters). I knew from inspecting InjLib code that InjLib library scanned the assembly code of the thread function I provided to RemoteExecute to detect the function code end address(and thus the code size) by inspecting JMP instructions and relevant target addresses.

I then began to think about tail call optimization performed by the compiler. A few searches in google revealed that this optimization might be disabled by changing the optimization settings of the compiler to /O1 instead of higher setting like /O2 or /Ox. Once I changed that setting to /O1, the call was now translated into a normal CALL instruction, allowing for correct calculation of the function code size.

Note that I haven't yet been successful at making the NTCREATETHREADEX work under 64-bit Win 7.Also I have read in two unrelated places claims that under 64-bit Win 7 the CreateRemoteThread function should work across sessions again too. And also have seen claims that deny this...I will update this post when I get additional info.

It seems to me that enabling SeDebugPrivilege makes the NTCREATETHREADEX work. And that NTCREATETHREADEX is indeed necessary sometimes as a fallback for ordinary CreateRemoteThread. There are cases where the latter fails and the NTCREATETHREADEX succeeds (given that SeDebugPrivilege is enabled for the program).

I may be mistaken about SeDebugPrivilege, but that's how it looks to me right now.

PROCESSOR_NUMBER is a structure defined in WinNT.h. You need to include a copy of this file.To compile the Assembly code (file stub.asm) in Visual Studio you need to select a custom build. I don't have VS2008 so I cannot help you.

I've tried to use the library under Visual Studio 2003 (both binary version shipped as part of the demo download, and the one built by me), but it kept crashing inside IsCodeSafe. This is the address of the function I pass in turned out to be pointing at "JMP <real function="" address="">" instruction.

I fixed the crash by adding the following block into RemoteExecute function, right after the call to GetOffsets:

The published code is totally free, therefore you can use it any way you like (you don't even have to credit me for it). I'm glad that someone find it usefull.I known that the code is now a bit outdated but unfortunately I don't have time to updated it. If you want you can, of course, do it yourself.

OK. I got it. For those of you facing the same problem:1. You first specify the DLL. (Before clicking on process name)2. Then select the appropriate radio buttons (Before clicking on process name)3. Finally select the process by clicking on it in the list box. The injection happens on selecting the process.

I have an application provided by a 3rd party, it is made up of 3 components

1) A hardware device connected to my serial port2) An ActiveX acting as a “Client” that is interacting with that hardware device3) An ActiveX that is acting as a “Server” component, listens to the Client’s ActiveX messages and sending the relevant commands to some IP address.

I want to trap the “Conversation” between the Client’s ActiveX and the hardware, in an attempt to replace that ActiveX (I don’t have the source code for that ActiveX) and be able to provide that “Bridge” functionality between the hardware device and the “Server” ActiveX myself, using MFC / C++ only

I'm attempting to use INJLIB to inject a .dll into a 3rd party application (I want to intercept button presses on a USB remote).

I'm using INJLIB so I can hook into API the program uses.

My plan was to replace this program's "Run" registry entry with my own program and then make my program launch the 3rd party app in a suspended state..

I was thinking that while suspended I could inject my own dll, let it run, then resume the application's progress..

I want to find out as much as possible about the workings of this application so it's important that I hook things before WinMain() executes.

Unfortunately, InjectDll() fails w/ERROR_READPROCESSMEMORY so this is obviously not possible. Note this is after I explicitly tried OpenProcess() w/ PROCESS_VM_READ|PROCESS_VM_OPERATION|PROCESS_VM_WRITE|PROCESS_CREATE_THREAD, even though CreateProcess is supposed to create with PROCESS_ALL_ACCESS

To solve this I need to redirect where the thread starts executing upon Resuming..

Basically I want to hook WinMain(), there's 2 options on how to do this and only one of them will probobaly work.

Either I need to rewrite the PE Image of the program so that the starting address points to my own WinMain function or I need to modify the program's thread using undocumented kernel functions.

Since I simply have to use ResumeThread on the application, my thinking is that I'm going to have to redirect where the thread executes, and changing the PE Image starting address will have no effect.

Anyways, that's where I've been going with this, do you guys have any suggestions?Is there an easier way?

Applause for a very nice looking article. Do the injection techniques work on a machine with DEP (Data Execution Protection) enabled hardware?

Our project uses an injection scheme based on code from Jeffrey Richter's Advanced Windows Programming book. It's starting to trip up on DEP enabled systems. We have not yet scheduled the work to fix it but we will be forced to do somthing soon.

Yes, it always contains the latest code from RemoteLib.If the remote code changes I'll update the two projects.I advise you to use InjLib because it contains more advanced code. Only use RemoteLib if you need some functions that aren't exported by InjLib.

By method do you mean functions RemoteExecute, InjectDll and StartRemoteSubclass ?They are used for different things:- RemoteExecute injects a block of code directly into a remote process (and doesn't use any external DLL). The code you are injecting must be coded as relocatable code and therefore must be written using a "low" level language (C, ASM).- InjectDll injects a DLL into a remote process. The DLL can be coded using any language.- StartRemoteSubclass changes the remote process window procedure (i.e. subclassing). Note that the new window procedure must be relocatable code ! You can accomplish the same thing injecting a DLL that do the subclassing.

I've been working on something similar to this for a while now. A few differences:

1. Mine doesn't even attempt to handle Win9x. Bravo for teasing out enough of the differences to make this work.2. My library is in Managed C++ so that I can call it from my .Net tools. I've actually managed to inject the .Net runtime into a remote process and get a remoting server up and running, allow me to fully run .Net code in a remote process.3. Most relevant (in my opinion), I don't do any SEH; I have to trust that my user (me, in this case) knows what he's doing enough to not crash the remote app -- not always the best assumption, even for myself; for instance, when I copy a function pointer that turns out to be an ILT thunk and try to execute that in a remote process. I got around that last problem by hacking in a little disasm check for a jump at the call site, but really I should implement a mini-disassembler and verify that the thunk is actually in the address space of the target DLL.

From what I can see, your implementation is far cleaner than mine at the injection level, and I may just steal your SEH code!