Inside the current lab we present a set of concepts and basic functions required
for starting Linux kernel programming. It is important to note that kernel
programming differs greatly from user space programming. The kernel is a
stand-alone entity that can not use libraries in user-space (not even libc).
As a result, the usual user-space functions (printf, malloc, free, open, read,
write, memcpy, strcpy, etc.) can no longer be used. In conclusion, kernel
programming is based on a totally new and independent API that is unrelated to
the user-space API, whether we refer to POSIX or ANSI C (standard C language
library functions).

An important difference in kernel programming is how to access and allocate
memory. Due to the fact that kernel programming is very close to the physical
machine, there are important rules for memory management. First, it works with
several types of memory:

Physical memory

Virtual memory from the kernel address space

Virtual memory from a process’s address space

Resident memory - we know for sure that the accessed pages are present in
physical memory

Virtual memory in a process’s address space can not be considered resident due
to the virtual memory mechanisms implemented by the operating system: pages may
be swapped or simply may not be present in physical memory as a result of the
demand paging mechanism. The memory in the kernel address space can be resident
or not. Both the data and code segments of a module and the kernel stack of a
process are resident. Dynamic memory may or may not be resident, depending on
how it is allocated.

When working with resident memory, things are simple: memory can be accessed at
any time. But if working with non-resident memory, then it can only be accessed
from certain contexts. Non-resident memory can only be accessed from the
process context. Accessing non-resident memory from the context of an
interrupt has unpredictable results and, therefore, when the operating
system detects such access, it will take drastic measures: blocking or
resetting the system to prevent serious corruption.

The virtual memory of a process can not be accessed directly from the kernel.
In general, it is totally discouraged to access the address space of a process,
but there are situations where a device driver needs to do it. The typical case
is where the device driver needs to access a buffer from the user-space. In
this case, the device driver must use special features and not directly access
the buffer. This is necessary to prevent access to invalid memory areas.

Another difference from the user-space scheduling, relative to memory, is due to
the stack, a stack whose size is fixed and limited. A stack of 4K it is used in
Linux, and a stack of 12K is used in Windows. For this reason, the
allocation of large structures on stack or the use of recursive calls should
be avoided.

In relation to kernel execution, we distinguish two contexts: process context
and interrupt context. We are in the process context when we run code as a
result of a system call or when we run in the context of a kernel thread. When
we run in a routine to handle an interrupt or a deferrable action, we run in
an interrupt context.

Some of the kernel API calls can block the current process. Common examples are
using a semaphore or waiting for a condition. In this case, the process is
put into the WAITING state and another process is running. An interesting
situation occurs when a function that can lead to the current process to be
suspended, is called from an interrupt context. In this case, there is no
current process, and therefore the results are unpredictable. Whenever the
operating system detects this condition will generate an error condition that
will cause the operating system to shut down.

One of the most important features of kernel programming is parallelism. Linux
supports SMP systems with multiple processors and kernel preemptivity. This
makes kernel programming more difficult because access to global variables must
be synchronized with either spinlock primitives or blocking primitives. Although
it is recommended to use blocking primitives, they can not be used in an
interrupt context, so the only locking solution in the context of an interrupt
is spinlocks.

Spinlocks are used in order to achieve mutual exclusion. When it can not get
access to the critical region, it does not suspend the current process, but it
uses the busy-waiting mechanism (waiting in a while() loop for the lock
to be released).
The code that runs in the critical region protected by a spinlock is not allowed
to suspend the current process (it must adhere to the execution conditions in
the interrupt context). Moreover, the CPU will not be released except for
the case of an interrupt. Due to the mechanism used, it is important that a
spinlock is being held as little time as possible.

Linux uses preemptive kernels. The notion of preemptive multitasking should not
be confused with the notion of a preemptive kernel. The notion of preemptive
multitasking refers to the fact that the operating system forcefully interrupts
a process running in user space when its quantum (time slice) expires, in order
to run another process.
A kernel is preemptive if a process running in kernel mode (as a result of a
system call) can be interrupted so that another process is being run.

Because of preemptivity, when we share resources between two portions of code
that can run from different process contexts, we need to protect ourselves with
synchronization primitives, even in the case of a single processor.

For Linux kernel programming, the convention used for calling functions to
indicate success is the same as in UNIX programming: 0 for success, or a value
other than 0 for failure.
For failures, negative values are returned as shown in the example below:

The printf equivalent in the kernel is printk, defined in
include/linux/printk.h. The printk() syntax is very similar
to printf(). The first
parameter of printk() decides the log category in which the current log
falls into:

If the logging level is missing from the printk() call, logging is done
with the default level at the time of the call. One thing to keep in mind is
that messages sent with printk() are only visible on the console if and
only if their level exceeds the default level set on the console.

To reduce the size of lines when using printk(), it is recommended to
use the following help functions instead of directly using the printk()
call:

pr_emerg(fmt,...);/* similar to printk(KERN_EMERG pr_fmt(fmt), ...); */pr_alert(fmt,...);/* similar to printk(KERN_ALERT pr_fmt(fmt), ...); */pr_crit(fmt,...);/* similar to printk(KERN_CRIT pr_fmt(fmt), ...); */pr_err(fmt,...);/* similar to printk(KERN_ERR pr_fmt(fmt), ...); */pr_warning(fmt,...);/* similar to printk(KERN_WARNING pr_fmt(fmt), ...); */pr_warn(fmt,...);/* similar to cu printk(KERN_WARNING pr_fmt(fmt), ...); */pr_notice(fmt,...);/* similar to printk(KERN_NOTICE pr_fmt(fmt), ...); */pr_info(fmt,...);/* similar to printk(KERN_INFO pr_fmt(fmt), ...); */

A special case is pr_debug() that calls the printk() function
only when the DEBUG macro is defined or if dynamic debugging is used.

As you can see, the first parameter indicates the size in bytes of the allocated
area. The function returns a pointer to a memory area that can be directly used
in the kernel, or NULL if memory could not be allocated. The second
parameter specifies how allocation should be done and the most commonly used
values for this are:

GFP_KERNEL - using this value may cause the current process to
be suspended. Thus, it can not be used in the interrupt context.

GFP_ATOMIC - using this value it ensures that the
kmalloc() function does not suspend the current process. It can be
used anytime.

The counterpart to the kmalloc() function is kfree(), a function
that receives as argument an area allocated by kmalloc(). This function
does not suspend the current process and can therefore be called from any
context.

Because linked lists are often used, the Linux kernel API provides a unified
way of defining and using lists. This involves using a
structlist_head element in the structure we want to consider as a
list node. The structlist_head is defined in
include/linux/list.h along with all the other functions that manipulate
the lists. The following code shows the definition of
the structlist_head and the use of an element of this type in another
well-known structure in the Linux kernel:

You see the stack type behavior introduced by the list_add macro,
and the use of a sentinel.

From the above example, it can be noticed that the way to define and use a list
(double-linked) is generic and, at the same time, it does not introduce an
additional overhead. The structlist_head is used to maintain the
links between the list elements. It can be noticed that iterating over the list
is also done with this structure, and that retrieving a list element can be done
using list_entry. This idea of implementing and using a list is not
new, as it has already been described in The Art of Computer Programming by
Donald Knuth in the 1980s.

Several kernel list functions and macro definitions are presented and explained
in the include/linux/list.h header.

spinlock_t (defined in linux/spinlock.h) is the basic type
that implements the spinlock concept in Linux. It describes a spinlock, and the
operations associated with a spinlock are spin_lock_init(),
spin_lock(), spin_unlock(). An example of use is given below:

In Linux, you can use reader-writer spinlocks, useful for readers-writers
problems.
These types of locks are identified by rwlock_t, and the functions
that can work on a reader-writer spinlock are:
* rwlock_init()
* read_lock()
* write_lock()
An example of use:

Operations are similar to classic mutex operations in user-space or spinlock
operations: the mutex is acquired before entering the critical region and it is
released after exiting the critical region. Unlike spinlocks, these operations
can only be used in process context.

Often, you only need to synchronize access to a simple variable, such as a
counter. For this, an atomic_t type can be used (defined in
include/linux/atomic.h), that holds an integer value. Below are some
operations that can be performed on an atomic_t variable.

A common way of using atomic variables is to store the status of an action
(e.g. a flag). So we can use an atomic variable to mark exclusive actions. For
example, we consider that an atomic variable can have the LOCKED and UNLOCKED
values, and if the respective variable equals LOCKED then a specific function
should return -EBUSY.
Such an usage is shown schematically in the code below:

Generate the skeleton for the task named 2-sched-spin and browse
the contents of the sched-spin.c file.

Compile the source code and load the module, according the above info:
(make build and make copy)

Notice that it is waiting for 5 seconds until the insertion
order is complete.

Unload the kernel module.

Look for the lines marked with: TODO0 to create an atomic
section. Re-compile the source code and reload the module into
the kernel.

You should now get an error. Look at the stack trace. What is the
cause of the error?

Hint

In the error message, follow the line containing the BUG
for a description of the error. You are not allowed to sleep in
atomic context. The atomic context is given by a section
between a lock operation and an unlock on a spinlock.

Note

The
schedule_timeout() function, corroborated with the
set_current_state macro, forces the current process to wait
for 5 seconds.

Generate the skeleton for the task named 3-memory directory and
browse the contents of the memory.c file. Notice the comments
marked with TODO. You must allocate 4 structures of type structtask_info and initialize them (in memory_init()), then print and
free them (in memory_exit()).

Generate the skeleton for the task named 4-list. Browse the
contents of the list.c file and notice the comments marked with
TODO. The current process will add the four structures from the
previous exercise into a list. The list will be built in the
task_info_add_for_current() function which is called when module is
loaded. The list will be printed and deleted in the list_exit()
function and the task_info_purge_list() function.

(TODO 1) Complete the task_info_add_to_list() function to allocate
a structtask_info structure and add it to the list.

(TODO 2) Complete the task_info_purge_list() function to delete
all the elements in the list.

Compile the kernel module. Load and unload the module by
following the messages displayed by the kernel.

Hint

Review the labs Lists section. When deleting items from
the list, you will need to use either the
list_for_each_safe or list_for_each_entry_safe
macros.

Generate the skeleton for the task named 7-list-test and browse
the contents of the list-test.c file. We’ll use it as a test
module. It will call functions exported by the 6-list-sync
task. The exported functions are the ones marked with extern in
list-test.c file.

To export the above functions from the module located at 6-list-sync/
directory, the following steps are required:

Functions must not be static.

Use the EXPORT_SYMBOL macro to export the kernel symbols. For
example: EXPORT_SYMBOL(task_info_remove_expired);. The
macro must be used for each function after the function is defined.

Remove from the module from 6-list-sync the code that avoids the
expiration of a list item (it is in contradiction to our exercise).

Compile and load the module from 6-list-sync/. Once loaded, it
exposes exported functions and can be used by the test
module. You can check this by searching for the function names
in /proc/kallsyms before and after loading the module.

Compile the test module and then load it.

Use lsmod to check that the two modules have been loaded.
What do you notice?

Unload the kernel test module.

What should be the unload order of the two modules (the module from
6-list-sync and the test module)? What happens if you use another order?