Like many in the security industry, we have been busy the last few days investigating the implications of the Shadow Brokers leak with regard to attack detection. Whilst there is a lot of interesting content, one particular component that attracted our attention initially was the DOUBLEPULSAR payload. This is because it seems to be a very stealthy kernel-mode payload that is the default payload for many exploits. Additionally, it can then be used to inject arbitrary DLLs into user land processes. We have also identified a potentially useful memory signature to detect whether this technique has been used on hosts that have not been rebooted since.

We were particularly interested in understanding if any of the code-injection focused memory analysis capabilities we use in our EDR software would detect this technique, to understand any potential implications in protecting our clients. Many people in the industry were testing injecting Meterpreter DLLs and other public frameworks, which we had also done and confirmed we could detect the injected DLLs and threads as normal. However, implants like Meterpreter are quite noisy in memory and we were unsure whether we were only detecting the specific use of standard public reflective loading techniques that Meterpreter makes use of, or if we were detecting the generic DOUBLEPULSAR injection technique itself.

Firstly, we investigated whether it was capable of injecting any conventional DLL through whatever mechanism it was using, as opposed to the in-memory reflective loading techniques favored by many public exploit frameworks that often require specially crafted DLLs. We tried injecting a standard windows DLL (wininet.dll in this case) into a target calc.exe process while monitoring with Sysinternals Process Monitor and analyzing the memory space of the target process with WinDBG before and after, as well as using our own EDR software.

As we can see, wininet.dll has loaded correctly because it has gone on to load other dependent DLLs that it imports, such as normaliz.dll, urlmon.dll etc. However, there is no observable activity before that and no standard load of wininet.dll itself, meaning it must have been loaded using an in-memory technique. Additionally, we were pleased to see two reflectively loaded DLL findings reported by our EDR software, confirming an in-memory technique for the DLL injection. By diffing the address space from WinDBG both pre and post DLL injection we quickly identified some interesting regions that corresponded with the suspicious reported regions from our reflective load findings.

The first region was interesting because it appeared like a properly loaded DLL with all the sections loaded individually, only it was not file mapping backed like a standard DLL and had clearly been loaded with a custom loader rather than the standard Windows loader. Analyzing these sections showed they corresponded to wininet.dll content as expected.

The second interesting area was another example of DLL content in a single memory region that corresponded to the full raw content of wininet.dll in its entirety. Curiously, there was a region just before this that was also allocated PAGE_EXECUTE_READWRITE and was a single page larger in size but that was almost entirely zeros, except for a small 23-byte region of memory (we will come back to this later).

While we were pleased that we had visibility of this with our EDR software, this was clearly a very advanced technique compared with the standard public methods we have seen used by various exploit frameworks and malware families in the wild before and we really wanted to know more about how it worked so we set about investigating further.

Separately to this, we had also been working on decrypting the C2 traffic to DOUBLEPULSAR as it uses a simple 4-byte XOR cipher and we had recently publicly released a python script to do this (https://github.com/countercept/doublepulsar-c2-traffic-decryptor). We used this to dump the full payload sent to the target server when using the DLL injection functionality within DOUBLEPULSAR. Upon analysis of this, we had found there were seemingly 4885 bytes of kernel code followed by a byte-for-byte copy of wininet.dll. We assumed that this must be some mechanism for performing a stealthy in-memory load of any DLL from kernel space directly into a target user land process and so we set about reversing the payload. Going through the full detail of every part of the payload would make this too long a blog post but we will cover the key components required to understand the operation here.

After some standard function prologue behavior, the payload calls the following function, which essentially walks backwards in memory until it finds an MZ header (0x5a4d). This is used to locate ntoskrnl.exe in kernel memory. It then uses this as a reference point to begin dynamically locating kernel functions that it would like to call. In order to do this, it uses the following function:

This function takes in a 4-byte “hash” that is used to locate the function it is interested in. This is very similar to other shellcode techniques that use 4-byte hashes instead of hardcoded function name strings to locate functions dynamically. The hashing procedure in this case is implemented as follows:

We implemented this hashing algorithm in python, generated a lookup table of hashes based on all the kernel functions available and used this to document exactly what functions the kernel payload was resolving for the later stages of its functionality. This lookup process with comments for the function names it is resolving can be seen in the following code:

It goes on to resolve a few more functions but ultimately at this point we were making the assumption that it would be enumerating processes to find the target process name for injection and then using a combination of ZwAllocateVirtualMemory() and KeInsertQueueApc() to inject the user land DLL into the target process and execute code via an asynchronous procedure call.

We will skip over some of the less interesting details here but essentially it performs the following using the functions it has located in the kernel:

Enumerate the running process

Check the process name matches the desired target

Attach to the process to extract the command line arguments, and check these also match the desired target

Allocate memory in the target process with PAGE_EXECUTE_READWRITE (0x40) protection

Write 0x12458a bytes of memory into this region from a later part of the kernel payload starting “SUWVATUAA”

Tracking this location later in the payload reveals another section of shellcode which eventually follows into the actual raw DLL contents. Now consider we did not see this in the user space memory regions with WinDBG earlier, we only saw the raw DLL contents and the section loaded DLL contents. We will come back to this point later.

Later, we see the APC calls that schedule a thread in the target process to execute the code region that was injected. This is very close to the end of the kernel payload. The injected memory appeared to begin with code and the DLL contents came later, not what we saw in the WinDBG memory space contents, this suggested that a second stage user land payload was injected along with the DLL, and control was transferred via APC in order for that to then actually load the DLL properly within the target process.

Interestingly, we then see the kernel payload actually wipe out its contents:

The kernel payload here wipes all code before this section and all code, including DLL contents, afterwards. However, it cannot wipe this one small code region as well and so leaves a large section of zeros with this small piece of code in the middle. It turns out that this code is very similar to the small code section we saw remaining in the mostly blank region of memory inside the calc.exe process earlier. Consequently, we felt it was likely that the apparent user land stage payload will be both responsible for properly loading the DLL in-memory and then also wiping itself.

In order to investigate this further, we then attached a debugger to calc.exe prior to the DLL injection payload being sent via DOUBLEPULSAR, pausing execution of the process and then analyzing the address space afterwards. This allowed us to see the process memory after the kernel payload had executed, but before the user land stage had executed to load the DLL and wipe memory; this would allow us to confirm our assumptions so far.

Here we can see exactly what we were expecting. The debugger has paused execution of calc.exe and as a result we do not see the section loaded DLL we saw before and we do not see the size 0x124000 region containing wininet.dll. However, the previously mostly blank memory region of size 0x125000 now contains the start of the code we saw injected via the kernel payload and later the full contents of wininet.dll. Allowing execution to resume then sees a new region of memory allocated to contain wininet.dll and then a proper section loaded version later in memory, followed by the vast majority of the original injected user land payload stage wiping itself, leaving only the small code snippet responsible for performing the wiping.

That leaves an interesting case for a memory signature as it is quite a specific sequence of bytes, preceded by all zeros and always occurs at the same offset even with different sized DLLs. This might prove a useful memory analysis indicator that could be used for finding evidence of previous compromise by DOUBLEPULSAR both in target user processes and within the kernel even long after the attackers have left, if the system has not been rebooted.

The actual DLL loading process used by the user land payload stage is intriguing in itself and we have not yet directly analyzed it, except for the wiping mechanism. That one can be saved for another day.