Analyzing the Linux Kernel vmsplice Exploit

Zero-day emerges

On February 9, zero-day exploit code [1] was posted on milw0rm site. It exploitedvulnerability in linux kernels Versions 2.6.17 to 2.6.24.1. This bug allowsan unprivileged local user to gain root privileges. This vulnerability wasassigned CVE-2008-0600.There are reports that this exploit is reliable and actively used in the wild.The inner workings of this exploit are quite interesting from thetechnical point of view; let’s have a look.

Details on the vulnerability and methods of exploitation

The vulnerability lies in the get_iovec_page_array function(in fs/splice.c, line numbers from 2.6.23.1-42.fc8 kernel),reachable from the vmsplice() system function:

The get_user_pages function expects its fourth argument (thenumber of pages descriptors to fill; it limits the return value) to be atleast 1. In the preceding code it is assumed that the npages variable is at least 1 (because len must be nonzero, so the off + len + PAGE_SIZE - 1 expression should be greater or equal than PAGE_SIZE). However, if the len variable is close to UINT32_MAX, then the off + len + PAGE_SIZE -1 computation will result in an integer wrap, and npages can be zero.

As a result, get_user_pages may return more thanPIPE_BUFFERS entries, and the pages array willoverflow. However, the overflow payload is not controlled by the attacker,so it would be difficult to turn this overflow into reliable code execution.

Here, the partial array, which is also PIPE_BUFFERSelements long, is overflowed with (off=0, plen=0×1000) pairs. Now, depending on the variableslayout chosen by the compiler, various data structures (that follow partial array) can be overwritten with zero. In the most common case, the pages array will be located after the partial array. The pages array contains pointers,thus after the preceding loop, it will contain NULL pointers.

Normally, when the kernel tries to access a NULL pointer, it will result in anexception and the process will be terminated. However, the attacker can mapmemory pages at address zero, and store arbitrary data there. In such a scenario,when the kernel dereferences pointers from the pages array,attacker-controlled data will be processed, which may result in arbitrarycode execution in the kernel context. In our case, the convenient technique isto make an entry in the pages array look as a compound pagedescriptor, which will result in a function call to an attacker-controlledaddress in user space:

Workarounds

The kernel upgrade is the preferred solution; but if it is not feasible, thereare workarounds.

A simple kernel module, which disables the sys_vmsplice systemcall, has been posted [2].

The exploit we’ve discussed relies heavily on the possibility to map memory ataddress zero. Starting with kernel 2.6.23, there is a mechanism to forbid suchmapping via procfs. The echo 65536 > /proc/sys/vm/mmap_min_addrcommand will set the lowest possible mapping to be at 64K. Note that:

SELinux must be enabled (in enforcing mode) for this command to take effect.

Although this setting certainly makes the current exploit fail, there is a nonzero probability that the vulnerability can be exploited without mapping the zero address. I know of no code capable of such exploitation; however, it cannot be ruled out.

This setting may prevent exploitation of future NULL pointer dereferences vulnerabilities. Very few programs make legitimate use of mapping the zero address.