Rumble In The Jungo – A Code Execution Walkthrough – CVE-2018-5189

Code Execution (CVE-2018-5189) Walkthrough on Jungo Windriver 12.5.1

Introduction

Windows kernel exploitation can be a daunting area to get into. There are tons of helpful tutorials out there and originally this post was going to add to that list. This is the story of how I found CVE-2018-5189 and a complete walkthrough of the exploit development cycle.

The idea was to find a 3rd party driver someone had already found a vulnerability for and work through developing an exploit. What ended up actually happening was discovering a previously undisclosed vulnerability in the “patched” version of that driver. This post will cover how we went from Windows kernel exploitation virgins to our first privilege escalation exploit, leveraging a race condition/double fetch to trigger a pool overflow. We won’t go into every aspect of the exploit as some topics have been done to death (such as trivial pool spraying), in these cases, we’ll link what we found to be useful references.

The product in question is Jungo’s Windriver version 12.5.1 [1]. This target was chosen after spotting a vulnerability disclosed in the previous version from Steven Seeley [2], and the plan was to step through the exploit he’d written and learn from it. After running through that exploit, we then downloaded the patched version and started to see if we could find anything new.

The setup used was a Windows 7 x86 VM with kernel debugging enabled.

Static Analysis

When looking for vulnerabilities in device drivers, the first place we would generally look is the IOCTL handler as this is where we can trace user input. As we can see here, it’s a bit of a monster:

One of the areas which captured our attention was the following group of IOCTL’s, which all call the same function:

Reversing sub_4199D8 gives the following insight.

The first basic block takes our user buffer and uses a value at offset 0x34 as the argument to another function:

Sub_4082D4 takes the value it’s passed and does some manipulation of that value before passing it off to ExAllocatePoolWithTag.

The astute reader may notice that there’s an integer overflow in this function. We tried finding some way of exploiting it but in the end, settled with exploiting the next issue.

A little later back in sub_4199D8 the following copying loop occurs:

The logic here is quite simple (and slightly flawed). Starting at user_buff+0x38 and pool_buff+0x3C, it continually copies 10 bytes at a time. Notice however that the loop guard compares the counter (eax) with the user-defined size (ebx+0x34). This is a classic race condition, albeit slightly tricky since it occurs over and over.

Path to Exploitation

So we have a race condition that should allow us to overflow a pool buffer that has a size that we roughly control. This is generally a pretty good situation to be in. To exploit this issue we need to take the following steps:

Understand how we can trigger the vulnerability with threads.

Understand how we can manipulate pool pages to control the overflow.

Understand how this manipulation can lead to code execution.

Finally, find some way of checking our exploit has worked so that we can break out of the race.

This is usually a good way to approach exploit development, start with a list of problems and find solutions for each in turn. To start with, we should look at developing a proof of concept that causes a crash, allowing us to debug the kernel.

Consider the following situation, we have 2 threads running on separate cores, both of which share access to the same user buffer that will be supplied to the driver. The first thread will continually make calls to the driver’s IOCTL interface, whilst the second will continually manipulate the size at user_buff+0x34.

We simply open a handle to the device using CreateFile, and then trigger a call to the vulnerable function through DeviceIoControl. Note that the user_buff parameter is shared between both threads.

With both of our functions defined, we now need a way of executing them on separate cores. We put this all together with a few nice functions Windows provides: CreateThread, SetThreadPriority, SetThreadAffinityMask and ResumeThread.

The goal here is to start two concurrent threads such that while one thread is manipulating the user-supplied size, the other is executing the vulnerable IOCTL. The aim is to consistently achieve a state whereby the value at user_buff+0x34 is larger than it originally was when it was used to allocate the pool buffer. At first, I assumed this would be extremely difficult because it is fetched from user space on every iteration, in reality, the above code should cause a bug check (BSOD) after a second or two.

With Windbg attached for kernel debugging [6], we get the following crash:

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 00000020, a pool block header size is corrupt.
Arg2: 86ff3488, The pool entry we were looking for within the page.
Arg3: 86ff3758, The next pool entry.
Arg4: 085a002c, (reserved)

Let’s look at the pool to see what’s happened, to do this we make use of a Windbg plugin called poolinfo. We can see the following pool information:

Notice how the free block after the drivers buffer has a strange header. Everything has been nulled out, which is not correct even for a free block. Looking at this buffer reveals that our user-controlled data ends just before this pools header:

It took a bit of time to figure this out, but essentially what is happening here is that the race condition loop is exiting due to the constant flipping. For example, we overflow by 4 bytes, but then on the next check, the value has been flipped back to the original size, breaking us out of the loop prematurely. This isn’t particularly bad, it just means it’s harder to demonstrate a proof of concept, to get around this issue, we just have to make sure that at whatever stage the loop exits, valid data is being written to the next pool buffer (i.e a correct pool header).

For now, we sort of have a working proof of concept, we know we can overflow the pool header of the next object, and so we need some way of controlling that object. The topic of pool spraying has been covered extensively online, and we won’t go into the nitty-gritty details here – a good reference I used was http://trackwatch.com/windows-kernel-pool-spraying/.

There are however specifics for this exploit that are important. To start with, it’s important to remember that we have control over the size of the allocation (remember that the size we pass becomes (size – 1) * 0xa + 0x48. Now the basic principles of pool spraying follow a pattern of:

Repeatedly create some objects a large number of times.

Free an exact number of sequential objects in random spots to create holes of specific sizes.

Trigger a call to the vulnerable driver where the pool allocation should fill one of the holes we created, meaning we know the object that will be located after it in memory.

After some trial and error, we decided to use the Event object with a typeIndex overwrite. The following function sprays the kernel pool with a large number of Event objects, and then creates holes in random places that are exactly 0x380 bytes large.

We add a call to this function just before entering the while loop in main, and get another crash, we inspect the pool page that caused the crash and find we have an allocated buffer exactly where we want it:

This now means that we have full control over the next object that we intend to corrupt. We have now solved our first two problems, with the next being how we manipulate the Event object to cause code execution.

As previously mentioned, we are using the typeIndex overwrite method, there are other ways to exploit this issue with Tarjei Mandt’s paper providing an extremely good reference for all of them [3]. If we look at the structure of an Event object we gain a bit of insight as to where to go:

There are a few values here that we need to keep to stop us from blue-screening. We need to fix the previousSize value to 0x380 (the size of the RDW pool buffer), and then keep all of the other values except the TypeIndex. The TypeIndex is an index into an array of pointers that describes the type of the chunk [4]:

To control this overflow we also need to figure out what to flip the size value from and to. We have a buffer that is 0x378 bytes (0x380 with the 8 byte pool header), and we only want to overflow the next event object. This gives us a requirement of a 0x40 byte overflow, 0x378+0x40 = 0x3b8, and remember the manipulation when the pool is allocated, (0x3b8 – 0x48) / 0x0a = 0x58. Finally, to flip the value between 0x52 and 0x58 we xor the value with 10.

We then need to free the corrupted object, we can do this by simply closing all of the open handles. And we get the following bug check:

Notice that the TypeIndex has been successfully overwritten, which has caused the kernel to look for an okayToCloseProcedure at 0x74. At this point we are very close, the next stage involves mapping a null page and putting a pointer to a function we want to execute (in kernel mode). The following function takes care of this for us:

Now we know how to control code execution, we need to find some way of escalating our privileges and returning to userland without crashing. The standard way of doing this is to use a token stealing shellcode to steal a SYSTEM token, an excellent post by Sam Brown covers this topic well [5]. The one thing to note here is that we must take into account how many arguments were pushed onto the stack and account for that when we return.

We can see here that ObpQueryNameString pushes 16 bytes onto the stack before calling our shellcode (ebx+0x74):

For a while, we kept getting blue screens even with a ret 0x10 at the end of the shellcode. The solution was found by declaring the function using __declspec(naked), this simply does not provide the function with a prologue or epilogue (exactly what we need). The (only slightly modified) code is shown here:

0x00f61790 is the address of our token stealing shellcode function, so we have successfully got control of eip.

Now all of this is great if we weren’t stuck in an infinite loop, but we are, and we need some way of figuring out we have escalated our credentials to then break out of the loop and pop a shell. There are a few ways to do this, we could simply set some value in our shellcode and then check it in the while loop. Being relatively new to the Windows API, we decided to look for some way of checking if our current privileges had changed on every iteration of the loop. In the end, we decided to use the GetTokenInformation function. The process is as follows:

Code

The code is presented here, there are two things to note. The first is the initial instructions in the shellcode that check whether the function has already been hit, we were getting an unexpected kernel mode trap bug without it. Secondly, on each iteration of the while loop we need to reset the user mode buffer, if we don’t we end up corrupting an Event object with arbitrary data, we couldn’t trace the cause of this, but it was assumed to be down to the buffer being modified by the driver.

// ConsoleApplication1.cpp : Defines the entry point for the console application.
//

The Patch

Jungo provided a patch for this vulnerability relatively quickly. The simplest way to mitigate against double fetch vulnerabilities is to quite obviously only fetch values from usermode once, storing said value in a local (kernel space) variable.

A quick analysis of the patch provided shows us that this is what has been implemented, starting in the IOCTL handler, we see the following:

This is much different to the vulnerable code, the size value passed from our user space buffer is stored in ecx and then pushed as an argument to sub_419CA2 (the actual value being multiplied by 0xa and having 0x3A added to it). Now in sub_419CA2, we see that whilst the user mode buffer is referenced multiple times, the actual size value (at user_buff+0x34) is never fetched.

We see here at the start of the function for example, that the argument pushed on the stack is fetched, which we do not have control over in user mode. Note also the hardcoded size value of 0x800, this also fixes the previously mentioned integer overflow.

Finally, in the vulnerable copying loop:

For reference, arg_4 is the size we passed ([user_buff+0x34] * 0xa + 0x3A), ebx is the pool buffer (which has a size of [user_buff+0x34] * 0xa + 0x48) and edi is the user buffer. Again we can see here that the value is being fetched from the stack frame of the function, which mitigates the vulnerability present in the previous version.