Illumos is the name of the operating system that was forked from OpenSolaris and is being used to power Joyent’s Triton
cloud platform. Joyent have their own branded version of Illumos called SmartOS. Joyent’s cloud is interesting because they offer hosting using Zones where customers share the same kernel. This is in contrast to traditional cloud providers who provide isolation between customers using virtual machines. However, it seems that kernel provided isolation is becoming more popular. Looking at AWS Lambda it appears that linux kernel namespaces are being used to provide isolation. Because the kernel is used to provide isolation it means the whole of the kernel becomes an attack surface. This is especially interesting in the case of Illumos because Illumos runs an interpreter inside the kernel called DTrace which is one of the big selling points of Triton.

DTrace is an incredibly complex piece of code and it consists of more than 17k lines of C code. It is very difficult to write this amount of C code without introducing lots of bugs :( During my review of the DTrace source code I stumbled across two integer overflows and an out of bound read that could be converted to arbitrary kernel writes. I also found five bugs that could be used for arbitrary memory reads. I find exploitation of these arbitrary memory reads more interesting than the privilege escalation bugs so I’m going to write about four of these first. I intend to write up the other bugs but these were disclosed starting from September 2015 so don’t hold your breath.

DTrace Copy Out

If you look at the DTrace user guide
it has this definition for the copyout
function:

void copyout(void *buf, uintptr_t addr, size_t nbytes)`

The `copyout()` action copies data from a buffer to an address in memory. The number of bytes that this action copies is specified in nbytes. The buffer that the data is copied from is specified in buf. The address that the data is copied to is specified in addr. That address is in the address space of the process that is associated with the current thread.

Unfortunately, copyout
does exactly what it says on the tin. It copies out kernel memory into userspace without any checks :(. The kaddr
and size
values are completely controlled by the user. If we check the rest of the call path there is no code that checks that the user is allowed access to the range specified by kaddr
and size
. In fact, there is a function specifically designed to check this called dtrace_canload
but this was not used. The patch
fixes this issue by adding a dtrace_canload
check:

At first glance there doesn’t seem to be that much interesting stuff in Illumos to read from kernel memory. Illumos doesn’t have KASLR so you can’t use an arbitrary memory to discover where stuff is mapped in to bypass KASLR. It should be possible to dump the filesystem buffer cache or even kernel SLABs used for syscall args which could hold sensitive information from other processes on the system but I didn’t persue this option.

It would be great if you could dump memory from other processes but this is not possible on x86 because only the currently running process and the kernel are mapped into memory. However, luckily for us Illumos 64bit maps all the physical memory at a known address in the kernel’s virtual address space. I think this is done to make it easier to set up page tables. So all you have to do to read the memory from another process is convert the virtual address you want to read to a physical address and then just add this physical address to the kernel physical address offset ( kpm_vbase
). This is all possible because the information to do this is inside the kernels memory and we have an arbitrary kernel memory read. The location of all these static locations like kpm_vbase
are also helpfully exported by the kernel (they are not really secret anyway because no KASLR) and can be accessed using a library called libctf. That doesn’t stand for lib capture the flag :(

We can also get a list of all the running processes from the practive
linked list. Normally when you are inside a Zone you can only see processes inside your own Zone. This allows us to create a tool that can be plugged in with an arbitrary kernel memory read and provide us with a ps that will dump all the processes running on the system and allow us to dump the memory contained in these processes.

Here is an example session with the tool being used to dump the heap from a vim process running in the global zone:

In a shared system this can be very dangerous because you can read private keys, and authentication information from other processes. It also shows that relatively benign vulnerabilities can be very serious on systems that are used for shared hosting.

POC Code on Github

DTrace INET_NTOA

This is a similar issue to the copyout
problem. This is what the DTrace user guide
has to say about inet_nota

string inet_ntoa(ipaddr_t *addr)

inet_ntoa takes a pointer to an IPv4 address and returns it as a dotted quad decimal string. This is similar to inet_ntoa() from libnsl as described in inet(3SOCKET), however this D version takes a pointer to the IPv4 address rather than the address itself. The returned string is allocated out of scratch memory, and is therefore valid only for the duration of the clause. If insufficient scratch space is available, inet_ntoa does not execute and an error is generated.

The code
for the inet_ntoa
function does not do any checking to see if the addr
is allowed to be accessed.

We can plug this vulnerability into our framework and use it to list processes and dump their memory contents. You might be concerned that reading 4 bytes at a time is slow but there is no noticable delay when listing processes.

POC Code on Github

DTrace Hash Corruption

DTrace has support for hashmaps and allows the user to access the data in the hashmap using the store and load instructions. DTrace tries to separate the metadata from the data and only allow the user to modify the data. However, it is possible to modify the metadata and this allows an attacker to create a memory oracle. An attacker can choose an address and an array of bytes and check whether the memory at that address is equal to the array of bytes. This is equivalent to a slow arbitrary memory read because you can check a single byte 256 times to read a single byte of memory.

In dtrace_canstore
it checks that the offset into the hash chunk is greater than the size of dtrace_dynvar_t.

Presumably, it is doing this to prevent the user from writing to the metadata in the hash chunk and the author believed all the metadata is contained in the dtrace_dynvar_t
structure. This belief is true but dtrace_dynvar_t
is a dynamically sized structure with the embedded structure dtrace_tuple
containing a dynamically sized array of dtrace_key
structures.

So if there is more than one key value then an attacker is able to write into the key values beyond the first one. The dttk_value
field is treated as pointer if the dttk_size
field is non-zero.

Unfortunately, the only place where dttk_value
field seems to be used is as an argument to the dtrace_bcmp
function. When the hashmap looks up a value and finds a matching entry based on the hash code it checks that the keys are equal using the dtrace_bcmp
function.

Doing 256 syscalls to read 1 byte is slow but the global ps is still responsive :)

POC Code on Github

DTrace STRSTR

If you look at the DTrace user guide
it has this definition for the strstr
function:

string strstr(const char *s, const char *subs)

strstr returns a pointer to the first occurrence of the substring subs in the string s. If s is an empty string, strstr returns a pointer to an empty string. If no match is found, strstr returns 0.

The dtrace_canload
function takes a pointer and a size for checking whether a range can be accessed. However, the strstr
function just takes a pointer to a string. How is it possible for strstr
to call dtrace_canload
to check whether the string can be safely searched? The original implementation
only checked dtrace_canload
after the string had been searched.

case DIF_SUBR_STRRCHR: {
/*
* We're going to iterate over the string looking for the
* specified character. We will iterate until we have reached
* the string length or we have found the character. If this
* is DIF_SUBR_STRRCHR, we will look for the last occurrence
* of the specified character instead of the first.
*/
uintptr_t saddr = tupregs[0].dttk_value;
uintptr_t addr = tupregs[0].dttk_value;
uintptr_t limit = addr + state->dts_options[DTRACEOPT_STRSIZE];
char c, target = (char)tupregs[1].dttk_value;
for (regs[rd] = NULL; addr < limit; addr++) {
if ((c = dtrace_load8(addr)) == target) {
regs[rd] = addr;
if (subr == DIF_SUBR_STRCHR)
break;
}
if (c == '\0')
break;
}
if (!dtrace_canload(saddr, addr - saddr, mstate, vstate)) {
regs[rd] = NULL;
break;
}
break;
}
There doesn’t seem to be any way to observe the result in regs[rd]
before it is clobbered when dtrace_canload
fails. All of this data is only visible to the current thread and not accessible globally. However, Illumos provides access to the hardware performance counters and allows you to set them to trace while in the kernel only.

It is possible to set DTRACEOPT_STRSIZE
to an arbitrary value. So if strsize is set to 1 then only one byte will be checked against the search value supplied to the strchr function. This effectively means the strchr function is checking if the byte at an address is a specific value. The number of instructions or branches taken will be different depending on whether the byte at the address is null, the byte at the address matches or the byte at the address is different.

If we set the performance counter to be PAPI_br_ins (Branch instructions taken) on my machine it will take 645 for a correct value and 646 for an incorrect value. Also, it will always take 645 for a zero value. So by iterating through the byte values (1-255) and calling strchr on each it is possible to read an arbitrary byte.

There is some noise which I suspect is caused by paging which can cause higher values but if you discard any result that does not match 646 or 645 and try again then this works out.

There is also a weird extra branch taken for some addresses. I believe this is because of the toxic range check. The toxic range check is done by addr > START && addr < END
so depending on whether addr > START
or not there will be a difference in the number of branches taken. (We ignore addr
< END` because we don’t try to read from toxic ranges.) This read is not ambiguous because the extra branch translates to either every byte not matching (all 646) or one byte not matching (646) and all the other bytes having an unknown result (647).

Again we plug this vulnerability into our exploit framework and dump memory from arbitrary processes in other zones. :)