The linux kernel maintains a table of pointers that reference various functions made available to user space as a way of invoking privileged kernel functionality from unprivileged user space applications. These functions are collectively known as system calls
.

Any legitimate software looking to hook kernel space functions should first consider using existing infrastructure designed for such uses like the Linux kernel tracepoints
framework or the Linux security module
framework. Rootkits are about the only reasonable application of these techniques, for some value of reasonable.

This code is an unintentional by-product of a project I was on at work. Considering the pedagogical value of such an endeavor, I decided to strip out all the code responsible for hooking the syscall table, distill it down into a single loadable kernel module that can be easily understood on its own, and write it up.

You can findthe source code here.

This code was written and tested on Ubuntu 14.04 LTS using the standard Ubuntu Linux 3.13.x kernel.

And without further ado, let’s get started.

Introduction

Hooking the Linux system call table from within a loadable kernel module is not all that difficult. After all, we are
running with kernel privileges. We can do whatever we want. We can dereference and overwrite any memory address at will.

Locate the system call table
Mark the segment of memory containing the system call table as writeable
By default, the syscall table is marked as read-only
Find the offset of the pointer to the function we want to hook in the syscall table
We will be targeting the “write” system call in this tutorial
Overwrite the appropriate 32-bit/4-byte pointer in the syscall table with a 32-bit address pointing to a function we define, thus completing the hook
Mark the syscall table as read-only
Even rootkits should clean up after themselves; don’t leave the place looking like a pig sty

Step number 1 is going to be the most difficult step by far.

The Main Challenge

There are some flimsy mechanisms in place to discourage LKMs (loadable kernel modules) from tampering with the syscall table for (hopefully obvious) security reasons.

First and foremost, the static portion of the Linux kernel - i.e. the portion that doesn’t reside in loadable kernel modules - does not export the syscall table symbol.

Why?

Because LKMs have no earthly business messing with the syscall table. The only valid reason for an LKM to overwrite system call pointers is to corrupt the behavior of the operating system, most often for concealment of malicious software.

Since the kernel does not export the syscall table symbol, we need to find it ourselves. We do this by manually reading in and scanning the System.map-$(uname -r) file, looking for the “sys_call_table” address. Once we have retrieved the address, we simply need to find the appropriate offset for it based on the system call we’re trying to hook, dereference it, and write to it.

This tutorial will show you how to hook system calls from a loadable kernel module (LKM) in the Linux kernel, complete with a code walkthrough. The code presented here has been tested and is known to work reliably.

Implementation

Although this code base has a few hundred lines to it, it’s actually very simple.

Much of the code simply handles logistics - nothing more. The two largest functions in this example are responsible for 1) acquiring the version of the currently running kernel so we can identify the correct System.map-$(uname -r) file to read from and 2) reading in the System.map-$(uname -r) file line by line, checking each full line read to see if it begins with “sys_call_table”.

That’s it.

Once we’ve got the address of the sys call table, it’s trivial to overwrite. Let’s take a look.

General Structure

There are a few things going on in this application. Much of the code comprises helper functions that read files and parse strings. Other than the helpers, we have our new write() function that is going to be the function we hook into the sys call table and our standard \
_init and __exit functions for loadable kernel module.

PROC_V is the file path to the /proc virtual filesystem location that contains version information of the currently running kernel.

BOOT_PATH is the file path to the System.map-$(uname -r) file that we are looking for sans appended version information. We have to retrieve the kernel version before we can finish constructing this string.

MAX_VERSION_LEN is the maximum length of the version information buffer used to store information read from the PROC_V address. We also use this define as a maximum buffer length for storing newline-separated strings in System.map-$(uname -r) as we parse it looking for the “sys_call_table” entry.

__init and __exit macros

In Linux loadable kernel modules, the function decorated with the __init macro is the entry point to the module when it’s loaded and the function decorated with the __exit macro is the destructor function that’s executed when the module is unloaded.

Since it only takes a couple lines of code to place our hooks in this simple example, we perform our dirty work directly in these functions. We’ll come back to these functions in a few minutes.

Helper Functions
char *acquire_kernel_version (char *buf)

Reads version info from PROC_V and chops it down to just the string we want. We need our version info to be in the same format that’s produced by $(uname -r).

First things first, we declare some variables:

struct file *proc_version;
char *kernel_version;
mm_segment_t oldfs;

Next, we have to change the legal virtual address space of this process to include the kernel data segment. If we skip this step, the call to read the file will fail the user space virtual address check performed by the kernel. In short, this allows us to read file contents into kernel memory later on:

oldfs = get_fs();
set_fs (KERNEL_DS);

Once we’re setup to read data into kernel space without causing a fault, we open the PROC_V file for reading and prepare our buffer:

Zero out the system_map_entry buffer to be safe. The system_map_entry buffer is going to be used to store each line in the System.map file as we iterate through it so we can check it for the sys_call_table entry:

memset(system_map_entry, 0, MAX_VERSION_LEN);

We read the file one character at a time until we have read an entire line. We determine that we’ve read an entire line by 1) checking for a newline (‘\n’) character or 2) checking to see if we have read in the maximum amount of data that our buffer can hold, i.e. MAX_VERSION_LEN bytes.

Once we have read in an entire line, we do a basic string comparison to see if the first part of our system_map_entry buffer matches the string “sys_call_table”. If it does, we allocate some space to store the following address in. The System.map file is in the format:

<symbol name> <address>

so we tokenize (strsep()) the system_map_entry buffer, which returns a pointer to the second space-separated column in the line we’ve just read. That is, we get a pointer straight to the address of the “sys_call_table” symbol, as per the System.map format shown above.

Once we’ve got that pointer, we simply copy it into sys_string and then invoke kstrtoul on sys_string to convert sys_string - which contains a string representation of the hex address of the “sys_call_table” symbol as pulled from System.map- - to an unsigned long (4 byte/32 bit) address using base 16 (hex) representation and write the value to our global syscall_table pointer:

Once we’re done doing all that, we clean up after ourselves by closing out our file handle, changing the addressable virtual memory segment back to user space, and returning.

filp_close(f, 0);
set_fs(oldfs);
kfree(filename);
return 0;

At this point, the syscall_table pointer - which was declared to be global to the module - now contains the address of the system call table as taken from /boot/System.map- and is ready to be dereferenced.

Placing the hooks

The __init onload function is the entry point to the module and is where our primary logic resides since it’s so simple. After we allocate require storage, we invoke the find_sys_call_table() function with the result of an invocation to acquire_kernel_version() passed in as an argument. By combining the two helpers discussed previously, we are able to collect all the prerequisite information we need to place our hooks:

After find_sys_call_table() returns, the global unsigned long syscall_table variable that we declared at the top of our C file is populated and ready for manipulation.

However, there is one little caveat left: the memory address where sys_call_table resides is not writeable. The processor itself will raise an exception if you try to write to it all willy-nilly.

So what do we do? We use the Linux paravirtualization system to change the 16th bit of the CR0 register. The CR0 register is one of the control registers in the x86 processor that affects basic CPU functionality. The 16th bit of the CR0 register is the “Write Protect”
bit that indicates to the processor that it cannot write to read-only memory pages, even when running as root. This is why the CPU will raise an exception if you try to write to syscall_table right off the bat.

Even though the CPU will refuse to write to read-only memory pages when the WP bit of the CR0 register is set, we are the kernel. We can just toggle that bit and continue on our way.

Using the write_cr0 and read_cr0 macros along with a logical bitmask for setting the WP bit (16th bit in CR0 register) to 0, we can trivially disable write protection as shown below.

Once that’s done, we simply dereference the appropriate offset for the system call we want to overwrite by
using the kernel-defined _ NR
* indices
, of which there is exactly 1 for each and every system call in the system. Using these predefined offsets, we write the address of our new_write() function over the address of the system call write() function:

Once we overwrite our target system call function pointer, we re-enable write protect in the CR0 register and exit the __init function successfully.

Removing the hooks

In order to keep our system in a clean and stable state, we want to remove our hooks gracefully when the module is unloaded. The __exit onunload() function behaves very similarly to the __init onload function since it also has to toggle the write protect bit in the CR0. The onunload function even writes to the exact same offset into the sys_call_table array as the onload function did.

The only difference is that the onunload function writes the address of the original write() function over the address of our new_write() function, putting everything back to the way it was before we came along: