woensdag 27 november 2013

Oftentimes, it can be useful to modify the behavior of an application without making extensive changes to the source code of the application. Specifically, one might want to intercept calls of certain functions to execute custom code before or after the execution of the original code, or one might want to retrieve or modify the parameters passed to a function. For example, it might be necessary to instrument the application for performance analysis or to add additional features to a program. In these cases when one does not have the source code available for the program, it is still possible to modify the code.
Here i will present the techniques i use for the different operating systems.Please note that i don't claim that these techniques are the best solutions for all cases.Appendix A: Windows DLL Injection Appendix B: Import Address Table Hooking (IAT) Appendix C: MS-Detours 1.5 (Direct3d) Appendix D: virtual table patching Appendix E: Example : hiding process(es) under windows

§1 Shared Libraries & Injection/Loading

Shared libraries are code objects that may be loaded during execution into the memory space associated with a process. Library code may be shared in memory by multiple processes as well as on disk. If virtual memory is used, processes execute the same physical page of RAM, mapped into the different address spaces of each process. This has advantages. For instance on some systems, applications were often only a few hundred kilobytes in size and loaded quickly; the majority of their code was located in libraries that had already been loaded for other purposes by the operating system.

To change code in another process we must load our own shared library in the address space of the other process. On UNIX platforms (Linux/MAC-OSX) this can be achieved using the LD_PRELOAD environment variable, which instructs the loader to load the specific shared libraries. Function and other symbol definitions in the specified libraries will be used instead of the original ones.
However on Windows systems there is no such thing as LD_PRELOAD, to achieve the same result we must use a little exploit called DLL Injection (On Windows shared libraries are .DLL's, on Linux .so's and on MAC-OSX .dylib's). See Appendix A below for more information.§2 Hooking/Detouring function calls§2.1 UNIX/LinuxUNIX offers a simple way to override functions in a shared library with the LD_PRELOAD environment variable. When you make a twin brother of a function that is defined in an existing shared library, put it in your shared library, and you register your shared library name in DYLD_INSERT_LIBRARIES, your function is used instead of the original one. It is exactly the same as MAC-OSX (see below) but use LD_PRELOAD instead of DYLD_INSERT_LIBRARIES .

§2.2 MAC-OSX

Since MAC-OSX is also UNIX based it's almost exactly the same as in Linux, only they have renamed LD_PRELOAD to DYLD_INSERT_LIBRARIES and .so to .dylib. In this example I've detoured fopen from a test program. In 2003 Jonathan Rentzsch showed ways of detouring in MAC-OSX and released mach_star, but this method is way easier.

You also need to define DYLD_FORCE_FLAT_NAMESPACE (doesn't matter what value it has).You can use the same technique to override a method in a class. Say there's a method named "libfff" in a class AAA.

class AAA
{
public:
int m;
AAA(){m = 1234;}
void libfff(int a);
};

To override it, you first need to know the mangled symbol name of the method.

$ nm somelibrary.dylib | grep "T "
00000ed6 T __ZN3AAA3fffEi

Then what you need to define is _ZN3AAA3fffEi. Don't forget removing the first '_'. If you see multiple symbols in the shared library and not sure which one to override, you can check it by de-mangling a symbol.

This is the framework of a standard API hook. All of this resides in a DLL that will be injected into a process. For this example, I chose to hook the MessageBoxW function. Once this DLL is injected, it will get the address of the MessageBoXW function from user32.dll, and then the hooking begins. In the BeginRedirect function, an unconditional relative jump (JMP) opcode (0xE9) instruction will contain the distance to jump to. The source is fully commented.

The reason why we restore the backup before getting the return value is because if we don't do it we will get an infinite loop, we call a function that jumps to the function that calls the function again etc etc.. If you change the parameters of the call to MessageBoxW inside MyMessageBoxW every messagebox that the DLL is injected to will have those parameters. See appendix C for the MS-Detours method which is way easier and recommended.
See the diagram:

Appendix A: Windows DLL injection

NOTE: the easy way is at the end of this appendix, i will start with the hardcore method first.
Welcome to appendix A, here i will explain how to make another process load our DLL. What we do is allocate a chunk of memory in the target process with our assembly function which calls LoadLibrary, we also need to allocate space for our DLL path name. Next we suspend the main thread of our target and modify the register that holds the next instruction to be executed. Than we patch our allocated function to return/call the right addresses. When we are done we resume the main thread.

This is a prototype for the function we are going to allocate in the target process which will call loadlibrary, the addresses are left blank because we patch them later on when we have the right values.

Now, we need to pause the thread in order to get it's "context". The context of a thread is the current state of all of it's
registers, as well as other peripheral information. However, we're mostly concerned with the EIP register, which points to the
next instruction to be executed. So, if we don't suspend the thread before retrieving its context information, it'll continue
executing and by the time we get the information, it'll be invalid. Once we've paused the thread, we'll retrieve it's context
information using the GetThreadContext() function. We'll grab the value of the current next instruction to be executed, so that we know where our function should return to. Then it's just a matter of patching up the function to have all of the proper pointers, and forcing the thread to execute it. (A-3)

There is another way using the CreateRemoteThread call. It is extremely easy, and relatively efficient. Before starting though, it is important to actually find the process to inject into. The Windows API provides a great function for doing this – CreateToolhelp32Snapshot.

I didn’t bother storing the value after I called Process32First because that will always be “[System Process]”, so there’s really no need. Process32Next returns TRUE on success, so just simply putting it in a loop and pushing the name of the process it received in a vector is what is needed. Once the loop is finished, every single process should be stored in processNames. This is great and all, but where does the DLL injection come in? Well, the PROCESSENTRY32 structure also has a member that holds the Process ID. Inside that loop, while we’re pushing the process names in our vector, we’re also going to inject the DLL.

The code above is pretty straightforward, we first get the current directory and append our dll name to it so we can later allocate it in the target process memory. Then we create a new thread which calls loadlibrary with our dll path as parameter.

Appendix B: Import Address Table (IAT) Hooking

Before we jump in the Import Address Table you first need to know a bit background information, I'll start with the PE format. The Portable Executable (PE) format is a file format for executables, object code, DLLs, FON Font files, and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data.

One section of note is the import address table (IAT), which is used as a lookup table when the application is calling a function in a different module. It can be in the form of both import by ordinal and import by name. Because a compiled program cannot know the memory location of the libraries it depends upon, an indirect jump is required whenever an API call is made. As the dynamic linker loads modules and joins them together, it writes actual addresses into the IAT slots, so that they point to the memory locations of the corresponding library functions. Though this adds an extra jump over the cost of an intra-module call resulting in a performance penalty, it provides a key benefit: The number of memory pages that need to be copy-on-write changed by the loader is minimized, saving memory and disk I/O time. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimized code that simply results in an indirect call opcode.

IAT hooking has pros and cons:
Cons:- The method you are hooking must be imported from another module, you can't just hook a certain address in memory. This is not optimal for directx hooks, since you will only find createdevice (you can use that to get the device tho) but for Opengl and such this is handy.
Pros:- Less detectable, you can make this into a fully external hook, that should be undetected for any antivirus/cheat because it also doesn't use any malicious calls.

This will be the procedure for internal (dll must be injected in target process) hooking:
- Retrieve DOS/NT Headers
- loop through the import descriptors

So first we get a handle to our main module:

int ip = 0;
if (module == 0)
module = GetModuleHandle(0);

then we retrieve the headers (warning:Whoever wrote the header file for
the PE format is certainly a believer in long, descriptive names, along
with deeply nested structures and macros. When coding with WINNT.H, it's
not uncommon to have mind blowing expressions):

And inside this loop, we loop through the functions, if you add an int to the firsthunk you get to the next thunk and so on.for (int funcIdx = 0; *(funcIdx + (LPVOID*)(iid->FirstThunk + (SIZE_T)module)) != NULL; funcIdx++){}

Now if you look in the import_desciptor structure you can see the name is on firsthunk +2 so

First of all you need to make sure you have MS-Detours 1.5 downloaded and added the corresponding files to your project. I am using version 1.5 because it's the simplest to use, and it does the job nicely.There is one important function we are going to use, its called DetourFunction. First we are going to need a typedef of the function we are going to hook (endscene in this case, since it gets called AFTER the drawing so we can add code right before that).

#pragma comment(lib, "d3d9.lib")
#pragma comment(lib, "d3dx9.lib")
// not the device is a parameter you can check this by reversing the calls of a real d3d program
typedef HRESULT(WINAPI* tEndScene)(LPDIRECT3DDEVICE9 pDevice);
tEndScene oEndScene = NULL;

Now to actually hook endscene we need to retrieve the address of the original function, this can be done in two ways, the first way is to reverse a sample direct3d program to find the address of the endscene call and add that to the module base of d3d9.dll. And the second way is to use the GetProcAddress function. The problem with the first way is that it is platform dependent, the address is different on 64bit Windows from the 32bit version.

HMODULE hd3d9 = GetModuleHandle("d3d9.dll");
// detourfunction from ms-detours, the first parameter is the original address and the second is our detour function
oEndScene = (tEndScene)DetourFunction( (LPBYTE)GetProcAddress(hd3d9, "EndScene" ), (LPBYTE)&mEndScene);
// where our detour function would look something like this
HRESULT WINAPI hkEndScene(LPDIRECT3DDEVICE9 pDevice){
// do evil
return oEndScene(pDevice);
}

What we did here is retrieve the address with GetProcAddress and pass it as the first parameter, the second parameter is a pointer to our own detour function (hkEndScene). Now you can add drawing function to the original program, benchmarking programs make good use of this.

Appendix D: Virtual Table (Vtable) Patching

Whenever a class defines a virtual function (or method), most compilers add a hidden member variable to the class which points to a so called virtual method table (VMT or Vtable). This VMT is basically an array of pointers to (virtual) functions. At runtime these pointers will be set to point to the right function, because at compile time, it is not yet known if the base function is to be called or a derived one implemented by a class that inherits from the base class. The code below shows an example of a VMT hook, if you want to implement this in direct3d you need to create a new device, and use that to replace the original function in the original device.

Appendix E: Example : Hiding process under Windows

In this example i will show how one can hook the system call that retrieves the list of processes and modify it so it will skip our process. For this i will use the mhook library but you can also use any other hooking method described in this article. The system call that the task manager uses to retrieve the list of processes is called NtQuerySystemInformation msdn. On msdn we can also find the appropriate structures needed for this call.

What we basically do here is create a loop that checks every process name, once we found our process name we skip our process and return the original call (without our process). Now we hook it using mhook.