We developped a small tool, "DIMCT" which simply allows tracing inter module calls, without a too big overhead.

During our evaluations tests, we often need to analyze quickly large Windows products, and want to pinpoint how their different bricks work together, especially their modules. In most cases, a module will import another module's functions, and this will be easily retrieved statically.

However, in a few other cases, a module may export classes constructors, which will return objects containing references towards their virtual methods. In some other few cases, callbacks may be registered and called by other modules. In these cases, it will not be trivial to pinpoint which method will be called by another module (and especially by which function).

Usage

Run the provided IDAPython script in order to generate a configuration file;

Start the monitored process;

Run the provided executable with the process PID, the configuration file, and the delay before killing the process;

Load the output with the IDAPython script in order to pinpoint which functions have been called;

Manually parse the output file if you want more information, e.g 'who called who'.

Internals

The inner concepts are also quite simple: inline hooks are placed in top of any identified function. The hook points toward a logging function, which only logs intermodular calls. Logs are performed in a dedicated memory area, which is periodically read and dumped by the remote process.

It follows this scheme:

Figure 1: DIMCT flow

The reasons why we call this tool "dirty" are the following ones:

we do NOT use a shared memory section, the monitoring process keeps reading the remote memory area and wipes it when full (two WriteProcessMemory calls are done, one to wipe the area, the second one to "release" the mutex). We just gave the monitoring process an higher priority than the target process in order to minimize the impact;

we do NOT use any Windows API in the logging function, so mutexes are implemented with a lock cmpxchg instruction (i.e no OS benefits such as thread priority boosts).

Yeah, that's really dirty, but this actually worked without too much bugs/overhead/drops, so... we keeped it as is. We also did not encounter the need for x64 binaries so actually only x86 processes are handled (the concept remains the same, we will implement it soon, I guess).

The main problems we faced is handling relative instructions while moving our saved instructions. Moving a SHORT JMP or a CALL, which opcodes are relatives to the current instruction position is not that straightforward, and that's the main reason why we used an IDAPython script.

In order to face this problem and use absolute addresses, we replaced CALLS and JMPS with PUSH/RET instructions, and conditional jumps with their counterparts and PUSH/RET instructions. For instance, a JNZ SHORT <addr> will be replaced by a JZ SHORT $+6 / PUSH <addr> / RET. Those absolute addresses belonging to the module itself are stored relatively to the module base address, and then "relocated" at the hook installation. Absolute addresses are also logged in order to be relocated by the program.

As an example, here are the original function, the configuration file and the final result:

Figure 2: DIMCT trampolines

Example

As an example, let's test it on KernelBase.dll and the 32 bit version of notepad.exe. First, load KernelBase.dll (the SysWOW64 version) in IDA Pro, load the script and run create_config("config.bin", True).

Now let's start the notepad.exe instance and then DIMCT tool, with notepad's PID and 120 seconds. The interface is actually quite responsive but may be slowed, especially when opening the file/open dialog. Finally we've got a log.bin file of approximatively 6Mb.

Figure 3: DIMCT running

In order to show the results in IDA, we use the parse_output function, and here are the called functions:

Ntdll called the 2 thread pools callbacks, the other ones seem to have been called by jitted code, which is in fact... our own "trampolined" code (which moved several CALL instructions), which we really should add in the white list.

Conclusion

We hope this basic tool/source code will be useful to others than us. We want it to remain simple, so the biggest improvements will probably be removing the "dirty" part (i.e using shared memory, Windows mutexes, and tuning the assembly code), and adding the x64 support. We may also test it against intra-modular calls in the future, but we're not really confident over the performances. We'll see. Feel free to contribute!