An Introduction to the Linux Kernel

This article provides an introduction to the Linux kernel, and demonstrates how to write and compile a module.

Have you ever wondered how a computer manages the most complex tasks with such efficiency and accuracy? The answer is, with the help of the operating system. It is the operating system that uses hardware resources efficiently to perform various tasks and ultimately makes life easier. At a high level, the OS can be divided into two partsthe first being the kernel and other is the utility programs. Various user space processes ask for system resources such as the CPU, storage, memory, network connectivity, etc, and the kernel services these requests. This column will explore loadable kernel modules in GNU/Linux.

The Linux kernel is monolithic, which means that the entire OS runs solely in supervisor mode. Though the kernel is a single process, it consists of various subsystems and each subsystem is responsible for performing certain tasks. Broadly, any kernel performs the following main tasks.

Process management: This subsystem handles the process life-cycle. It creates and destroys processes, allowing communication and data sharing between processes through inter-process communication (IPC). Additionally, with the help of the process scheduler, it schedules processes and enables resource sharing.

Memory management: This subsystem handles all memory related requests. Available memory is divided into chunks of a fixed size called pages, which are allocated or de-allocated to/from the process, on demand. With the help of the memory management unit (MMU), it maps the process virtual address space to a physical address space and creates the illusion of a contiguous large address space.

File system: The GNU/Linux system is heavily dependent on the file system. In GNU/Linux, almost everything is a file. This subsystem handles all storage related requirements like the creation and deletion of files, compression and journaling of data, the organisation of data in a hierarchical manner, and so on. The Linux kernel supports all major file systems including MS Windows NTFS.

Device control: Any computer system requires various devices. But to make the devices usable, there should be a device driver and this layer provides that functionality. There are various types of drivers present, like graphics drivers, a Bluetooth driver, audio/video drivers and so on.

Networking: Networking is one of the important aspects of any OS. It allows communication and data transfer between hosts. It collects, identifies and transmits network packets. Additionally, it also enables routing functionality.

Dynamically loadable kernel modules
We often install kernel updates and security patches to make sure our system is up-to-date. In case of MS Windows, a reboot is often required, but this is not always acceptable; for instance, the machine cannot be rebooted if is a production server. Wouldnt it be great if we could add or remove functionality to/from the kernel on-the-fly without a system reboot? The Linux kernel allows dynamic loading and unloading of kernel modules. Any piece of code that can be added to the kernel at runtime is called a kernel module. Modules can be loaded or unloaded while the system is up and running without any interruption. A kernel module is an object code that can be dynamically linked to the running kernel using the insmod command and can be unlinked using the rmmod command.

A few useful utilities
GNU/Linux provides various user-space utilities that provide useful information about the kernel modules. Let us explore them.lsmod: This command lists the currently loaded kernel modules. This is a very simple program which reads the /proc/modules file and displays its contents in a formatted manner.insmod: This is also a trivial program which inserts a module in the kernel. This command doesnt handle module dependencies.

rmmod: As the name suggests, this command is used to unload modules from the kernel. Unloading is done only if the current module is not in use. rmmod also supports the -f or –-force option, which can unload modules forcibly. But this option is extremely dangerous. There is a safer way to remove modules. With the -w or –-wait option, rmmod will isolate the module and wait until the module is no longer used.

modinfo: This command displays information about the module that was passed as a command-line argument. If the argument is not a filename, then it searches the /lib/modules/<version> directory for modules. modinfo shows each attribute of the module in the field:value format.Note: <version> is the kernel version. We can obtain it by executing the uname -r command.

dmesg: Any user-space program displays its output on the standard output stream, i.e., /dev/stdout but the kernel uses a different methodology. The kernel appends its output to the ring buffer, and by using the dmesg command, we can manage the contents of the ring buffer.Preparing the system
Now its time for action. Lets create a development environment. In this section, lets install all the required packages on an RPM-based GNU/Linux distro like CentOS and a Debian-based GNU/Linux distro like Ubuntu.Installing CentOS
First install the gcc compiler by executing the following command as a root user:

[root]# yum -y install gcc

Then install the kernel development packages:

[root]# yum -y install kernel-devel

Finally, install the make utility:

[root]# yum -y install make

Installing Ubuntu
First install the gcc compiler:

[mickey] sudo apt-get install gcc

After that, install kernel development packages:

[mickey] sudo apt-get install kernel-package

And, finally, install the make utility:

[mickey] sudo apt-get install make

Our first kernel module
Our system is ready now. Let us write the first kernel module. Open your favourite text editor and save the file as hello.c with the following contents:

Any module must have at least two functions. The first is initialisation and the second is the clean-up function. In our case, init_module() is the initialisation function and cleanup_module() is the clean-up function. The initialisation function is called as soon as the module is loaded and the clean-up function is called just before unloading the module. MODULE_LICENSE and other macros are self-explanatory.

There is a printk() function, the syntax of which is similar to the user-space printf() function. But unlike printf() , it doesnt print messages on a standard output stream; instead, it appends messages into the kernels ring buffer. Each printk() statement comes with a priority. In our example, we used the KERN_INFO priority. Please note that there is no comma (,) between KERN_INFO and the format string. In the absence of explicit priority, DEFAULT_MESSAGE_LOGLEVEL priority will be used. The last statement in init_module() is return 0 which indicates success.

The names of the initialisation and clean-up functions are init_module() and cleanup_module() respectively. But with the new kernel (>= 2.3.13) we can use any name for the initialisation and clean-up functions. These old names are still supported for backward compatibility. The kernel provides module_init and module_exit macros, which register initialisation and clean-up functions. Let us rewrite the same module with names of our own choice for initialisation and cleanup functions:

Compiling and loading the module
Now, let us understand the module compilation procedure. To compile a kernel module, we are going to use the kernels build system. Open your favourite text editor and write down the following compilation steps in it, before saving it as Makefile. Please note that the kernel modules hello.c and Makefile must exist in the same directory.

To build modules, kernel headers are required. The above makefile invokes the kernels build system from the kernels source and finally the kernels makefile invokes our Makefile to compile the module. Now that we have everything to build our module, just execute the make command, and this will compile and create the kernel module named hello.ko:

We have now successfully compiled our first kernel module. Now, let us look at how to load and unload this module in the kernel. Please note that you must have super-user privileges to load/unload kernel modules. To load a module, switch to the super-user mode and execute the insmod command, as shown below:

[root]# insmod hello.ko

insmod has done its job successfully. But where is the output? It is appended to the kernels ring buffer. So lets verify it by executing the dmesg command:

[root]# dmesg
Hello, World !!!

We can also check whether our module is loaded or not. For this purpose, lets use the lsmod command:

[root]# lsmod | grep hello
hello 859 0

To unload the module from the kernel, just execute the rmmod command as shown below and check the output of the dmesg command. Now, dmesg shows the message from the clean-up function:

[root]# rmmod hello
[root]# dmesg
Hello, World !!!
Exiting ...

In this module, we have used a couple of macros, which provide information about the module. The modinfo command displays this information in a nicely formatted fashion:

Finding the PID of a process
Let us write one more kernel module to find out the Process ID (PID) of the current process. The kernel stores all process related information in the task_struct structure, which is defined in the <linux/sched.h> header file. It provides a current variable, which is a pointer to the current process. To find out the PID of a current process, just print the value of the current->pid variable. Given below is the complete working code (pid.c):

Now compile and insert the module and check the output using the dmesg command:

[mickey]$ make
[root]# insmod pid.ko
[root]# dmesg
pid = 6730

A module that spans multiple files
So far we have explored how to compile a module from a single file. But in a large project, there are several source files for a single module and, sometimes, it is convenient to divide the module into multiple files. Let us understand the procedure of building a module that spans two files. Lets divide the initialization and cleanup functions from the hello.c file into two separate files, namely startup.c and cleanup.c.
Given below is the source code for startup.c:

Passing command-line arguments to the module
In user-space programs, we can easily manage command line arguments with argc/ argv. But to achieve the same functionality through modules, we have to put in more of an effort.
To achieve command-line handling in modules, we first need to declare global variables and use the module_param() macro, which is defined in the <linux/moduleparam.h> header file. There is also the MODULE_PARM_DESC() macro which provides descriptions about arguments. Without going into lengthy theoretical discussions, let us write the code:

After compilation, first insert the module without any arguments, which display the default values of the variable. But after providing command-line arguments, default values will be overridden. The output below illustrates this:

If you want to learn more about modules, the Linux kernels source code is the best place to do so. You can download the latest source code from https://www.kernel.org/. Additionally, there are a few good books available in the market like Linux Kernel Development (3rd Edition) by Robert Love and Linux Device Drivers (3rd Edition). You can also download the free book from http://lwn.net/Kernel/LDD3/.