Another post about _old_ but damn sexy kernel level vulnerabilities.
Both bugs were disclosed on February 2008 as 0day vulnerabilities with freaking awesome exploit codes by qaaz. Almost the exact same moment of the exploit codes release by qaaz, cliph of iSec.pl was publishing an advisory for the exact same vulnerabilities. I’m not going to try to attempt to find what the relation between those two events might be since this is completely off topic.
So, the first bug (CVE-2008-0009) was referring to 2.6.22 through 2.6.24 releases of Linux kernel. Specifically, here is the susceptible code from 2.6.23’s fs/splice.c.

/*
* For lack of a better implementation, implement vmsplice() to userspace
* as a simple copy of the pipes pages to the user iov.
*/
static long vmsplice_to_user(struct file *file, const struct iovec __user *iov,
unsigned long nr_segs, unsigned int flags)
{
struct pipe_inode_info *pipe;
struct splice_desc sd;
ssize_t size;
int error;
long ret;
...
/*
* Get user address base and length for this iovec.
*/
error = get_user(base, &iov->iov_base);
if (unlikely(error))
break;
error = get_user(len, &iov->iov_len);
if (unlikely(error))
break;
/*
* Sanity check this iovec. 0 read succeeds.
*/
if (unlikely(!len))
break;
if (unlikely(!base)) {
error = -EFAULT;
break;
}
sd.len = 0;
sd.total_len = len;
sd.flags = flags;
sd.u.userptr = base;
sd.pos = 0;
size = __splice_from_pipe(pipe, &sd, pipe_to_user);
if (size < 0) {
...
return ret;
}
&#91;/sourcecode&#93;
This code is part of vmsplice(2) system call. As we can read from <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/vmsplice.2.html">its man page</a>, this system call was introduced in 2.6.17 release of the Linux kernel and it is used to map a specified user memory range into a pipe.
In the above code, we can see that it retrieves the contents of the user controlled iovec structure (second argument of the system call), using get_user() and stores it into 'base' and 'len' respectively. After that, some basic sanity checks for 'len' equal to zero and 'base' equal to NULL take place. The next part is really interesting, vmsplice_to_user() will directly initialize 'splice_desc' structure with the user controlled 'len' as 'total_len' and 'base' as 'userptr'. At last, it will invoke __splice_from_pipe() to splice the data from 'pipe' instructed by 'sd' using 'pipe_to_user' handler routine. __splice_from_pipe() calls the handler routine (in this case pipe_to_user()) with no checks being performed on the user controlled pointer passed to it. A quick look at pipe_to_user() reveals this:
static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
struct splice_desc *sd)
{
char *src;
int ret;
...
/*
* See if we can use the atomic maps, by prefaulting in the
* pages and doing an atomic copy
*/
if (!fault_in_pages_writeable(sd->u.userptr, sd->len)) {
src = buf->ops->map(pipe, buf, 1);
ret = __copy_to_user_inatomic(sd->u.userptr, src + buf->offset,
sd->len);
...
return ret;
}

So basically, user has complete control over this __copy_to_user_inatomic() call. Because of this bug, you can read arbitrary bytes from a pipe. Of course, this does not sound so trivial to exploit. Here is what qaaz did in his diane_lane_fucked_hard.c fascinating code.

So, he initializes ‘uid’ and ‘gid’ and if you’re already root it just exits with a message full of anger. :P
Otherwise, it will call get_target() to retrieve the location of ‘sys_vm86old’ system call from /proc/kallsyms like that:

Clearly, user has almost complete control over the __copy_from_user_inatomic() call since his pointer ‘src’ is not checked and can be set to any valid address. In his advisory, cliph states that this can lead to indirect arbitrary read of kernel memory but he was not aware if it was exploitable or not. And here comes qaaz with his amazing jessica_biel_naked_in_my_bed.c code.
This vulnerability was present since the introduction of vmsplice(2) system call, consequently it affects 2.6.17 up to 2.6.24.1. His code starts like this…

So, it zeroed out the allocated space and informs the user about its location along. Next, he sets its PG_compound flag which is defined at include/linux/page-flags.h and it is used to mark this page as part of a compound page. He also sets its ‘private’ address to pages[0] value, ‘count’ that represents the usage count to 1 and finally, the next page’s next LRU (Least Recently Used) cache pointer of its list_head structure, to the kernel_code()’s address which you’ll at the end how cool it is.
His kernel_code() routine is almost identical to the one of the previous exploit as you can see here:

The ‘map_size’ is calculated for allocation of three pipe buffers. Then, mmap(2) is called but it would not be able to map NULL since it is already mapped at pages[0]. It will map it in some other location and do the following:

/*****/
map_size -= 2 * PAGE_SIZE;
if (munmap(map_addr + map_size, PAGE_SIZE) < 0)
die("munmap", errno);
&#91;/sourcecode&#93;
This make munmap() free part of the previously allocated buffer. Specifically, you can consider the allocations like this:
<pre>
pages[0] pages[1] pages[2] pages[3]
-------------- -------------- -------------- --------------
from: 0 from: 0x20 from: 0x4000 from: 0x4020
to: 0x1000 lru-&gt;next: kc to: 0x5000 lru-&gt;next: kc
PG_compound PG_compound
before memory unmap:
pages[4]
--------------
from: 0xb7d97000
to: 0xb7dc9000
unmap: 0xb7d97000 + 0x30000 (= 0xb7dc7000) up to 0xb7dc8000
After unmap:
pages[4]
--------------
from: 0xb7d97000
to: 0xb7dc7000
As well as page located at: 0xb7dc9000
</pre>
Now, that kernel memory is arranged he creates the pipe pair and immediately closes reading file descriptor of the pipe like this:
/*****/
if (pipe(pi) < 0) die("pipe", errno);
close(pi&#91;0&#93;);
iov.iov_base = map_addr;
iov.iov_len = ULONG_MAX;
&#91;/sourcecode&#93;
The iovec structure is initialized with the address of the last allocation (pages&#91;4&#93; one), and its length is set to 0xffffffff (for 32-bit arhitectures). The final code is...
&#91;sourcecode language="c"&#93;
signal(SIGPIPE, exit_code);
_vmsplice(pi&#91;1&#93;, &iov, 1, 0);
die("vmsplice", errno);
return 0;
}
&#91;/sourcecode&#93;
In case of a SIGPIPE signal sent to our process, exit_code() routine will be executed and the evil vmsplice(2) system call takes place. It will request the copy from 'map_addr' which will reach copy_from_user_mmap_sem() since this needs a semaphore lock to avoid mmap() because of the last mmap()/munmap() operations. However, since the destination (pi&#91;0&#93; file descriptor) is closed, it will lead to a "broken pipe" (aka SIGPIPE) signal sent to our process from the kernel and indirectly calling the LRU-&gt;next which contains the address of the compound page's put_page() routine.
This could be considered as a kernel-like .DTORS overwrite since put_page() is used when a PG_compound flag is encountered on a page. Specifically, lru-&gt;next will have to point to a callback function that will be normally set by SLAB allocator during initialization of the page as we can read at mm/slab.c:
&#91;sourcecode language="c"&#93;
static inline void page_set_cache(struct page *page, struct kmem_cache *cache)
{
page->lru.next = (struct list_head *)cache;
}

But since qaaz set this by his own, during the deallocation of that page, the following code from include/linux/mm.h will be executed:

Because of this, free on pages[0] and pages[2] would result into (compound_page_dtor *)page[1].lru.next(page) and (compound_page_dtor *)page[3].lru.next(page) being executed respectively. But this is where kernel_code() resides!
After executing this, exit_code() function is called. This is quite simple…

As you may already know, access_ok() is a simple macro from arch/x86/include/asm/uaccess.h that uses __range_not_ok() to check that src+n is inside an accessible range.
Tip: on x86 as well as x86_64 the first argument of access_ok() is completely ignored.
This is definitely one of those exploit codes that makes you wanna cry from emotion and wonder…