Kernel Korner - Why and How to Use Netlink Socket

Use this bidirectional, versatile method to pass data between kernel and user space.

Due to the complexity of developing and maintaining the kernel,
only the most essential and performance-critical code are placed in
the kernel. Other things, such as GUI, management and control code,
typically are programmed as user-space applications. This practice of
splitting the implementation of certain features between kernel and user
space is quite common in Linux. Now the question is how can kernel
code and user-space code communicate with each other?

The answer is the various IPC methods that exist between kernel
and user space, such as system call, ioctl, proc filesystem or
netlink socket. This article discusses netlink socket and reveals
its advantages as a network feature-friendly IPC.

Introduction

Netlink socket is a special IPC used for transferring information
between kernel and user-space processes. It provides a full-duplex
communication link between the two by way of standard socket APIs
for user-space processes and a special kernel API for kernel
modules. Netlink socket uses the address family AF_NETLINK,
as compared to AF_INET used by TCP/IP socket. Each netlink socket
feature defines its own protocol type in the kernel header file
include/linux/netlink.h.

The following is a subset of features and their protocol types
currently supported by the netlink socket:

Why do the above features use netlink instead of system calls, ioctls
or proc filesystems for communication between user and kernel worlds?
It is a nontrivial task to add system calls, ioctls or proc
files for new features; we risk polluting the kernel
and damaging the stability of the system. Netlink socket is
simple, though: only a constant, the protocol type, needs to be added to
netlink.h. Then, the kernel module and application can talk using socket-style APIs immediately.

Netlink is asynchronous because, as with any other socket API,
it provides a socket queue to smooth the burst of messages. The system
call for sending a netlink message queues the message to the receiver's
netlink queue and then invokes the receiver's reception handler. The
receiver, within the reception handler's context, can decide whether
to process the message immediately or leave the message in the queue
and process it later in a different context. Unlike netlink, system calls require
synchronous processing. Therefore, if we use a system call to pass a message
from user space to the kernel, the kernel scheduling granularity may be
affected if the time to process that message is long.

The code implementing a system call in the kernel is linked statically to
the kernel in compilation time; thus, it is not appropriate to include
system call code in a loadable module, which is the case for most device
drivers. With netlink socket, no compilation time dependency exists
between the netlink core of Linux kernel and the netlink application living
in loadable kernel modules.

Netlink socket supports multicast, which is another benefit over system
calls, ioctls and proc. One process can multicast a
message to a netlink group address, and any number of other processes
can listen to that group address. This provides a near-perfect
mechanism for event distribution from kernel to user space.

System call and ioctl are simplex IPCs in the sense that a session for
these IPCs can be initiated only by user-space applications. But, what if a
kernel module has an urgent message for a user-space application? There
is no way of doing that directly using these IPCs. Normally, applications
periodically need to poll the kernel to get the state changes, although
intensive polling is expensive. Netlink solves this problem gracefully
by allowing the kernel to initiate sessions too. We call it the duplex
characteristic of the netlink socket.

Finally, netlink socket provides a BSD socket-style API that is well
understood by the software development community. Therefore, training
costs are less
as compared to using the rather cryptic system call APIs and ioctls.

Relating to the BSD Routing Socket

In BSD TCP/IP stack implementation, there is a special socket called
the routing socket. It has an address family of AF_ROUTE, a protocol family of
PF_ROUTE and a socket type of SOCK_RAW. The routing socket in BSD is
used by processes to add or delete routes in the kernel routing table.

In Linux, the equivalent function of the routing socket is provided by the
netlink socket protocol type NETLINK_ROUTE.
Netlink socket provides a functionality superset of
BSD's routing socket.

pls anyone can give me the code for netlink functionality supported in linux kernel version above 2.6.27.
some structure members which is in lower versions are not supported in higher versions.
for example "netlink_skb_parms" structure doesnot has member "dst_pid" which is used for unicast message from kernel space to user space.

I have written a program using generic netlinks to communicate to/from kernel space. The user program sends a string and expects two strings from the kernel.

The program is working well and as expected kernel sends two hello strings to the user.

But the problem is kernel is sending one more message on the same socket which I don't expect.
I.e after the first two reads on the socket, the third should block for the data until kernel sends further messages. But instead of blocking, the third read in the user application reads a message which seems to be an error message from kernel.

I am new to this networking field and usage of netlinks.
My requirement is to pass some data asynchronously to the kernel module from user space and viceversa.

I managed to satisfy my first requirement using netlinks. I.e I have created my own generic family in the kernel with some registered operations. Using netlinks library I manged to pass the data with appropriate command to my corresponding kernel module.

But I doubt whether data from the kernel module can be passed to user space asynchronously using netlinks.

Is there anyway that I can register some callback functions in the user on the same netlink family i have created in the kernel for specific commands and pass the data to the user space?

If yes please let me know how can i achieve it with generic netlink infrastructure.
If not it will greatfull if I can get some hints on the alternatives.

Rather than expecting an echo message from kernel,can user by some means ask Kernel to send an expected reply.?
I will make things more clear.
Scenario:

I am expecting messages from kernel on any USB plug in.I am able to receive then also through netlink sockets.But if the USB is already plugged in,Kernel fails to send a message to user space.Can I demand a USB plug in /plug out message from Kernel by sending my requirement through sendmsg()!! -- :)

Hi,
Should I recompile the linux kernal, after writing the Kernal module of netlink. If so, could you tell me how to do it?. I could not understand how the Kernal module of netlink will get activated. I want to send certain packets (coming from a certain IP addresses) to my application residing in User space. To filter the messages I want to use IP tables. How the IPtable filtered messages will go to the Kernal module of netlink, so that from there it will be sent to my user space application.

Shell/PERL/etc apps can use /proc on any distro without having to rebuild the app or worry about library incompatibilities. Conscientious developers are cautious when changing the /proc contents since hundreds of apps could be using the information... The netlink infrastructure may be more efficient, but how would the wealth of information provided by /proc be made available to system administrators as easily as /proc is via cat, less, or grep? How can I get information from netlink using those applications? The /proc support does have its advantages. Netlink is not the silver bullet.

thanks for your article it is very useful . i try to create communication socket beetwin Kernel module and user land program . i used you proposed code . but it is not worked correctly . i compile my module and userland code correctly but there is no communication between them .
please , It would be nice if you could add working or at least compilable examples.

thanks for your article it is very useful . i try to create communication socket beetwin Kernel module and user land program . i used you proposed code . but it is not worked correctly . i compile my module and userland code correctly but there is no communication between them .
please , It would be nice if you could add working or at least compilable examples.

thanks for your article it is very useful . i try to create communication socket beetwin Kernel module and user land program . i used you proposed code . but it is not worked correctly . i compile my module and userland code correctly but there is no communication between them .
please , It would be nice if you could add working or at least compilable examples.

First, I would like to thank the author a lot for this article, it was very useful indeed.
I have tried the user space code above, and noticed a ENOBUFS (No buffer space available). I finally discovered the reason : the 'struct msghdr msg' was not zeroed, and only some fields are filled (msg_name, msg_namelen, msg_iov, msg_iovlen), letting for example the msg_controllen field undefined (a check of it is made in the kernel, if too large, a ENOBUFS is returned).
My problem was solved by adding the following line :memset(&msg,0,sizeof(msg));
(of course, before filling the various necessary fields of the message).

I am using NETLINK sockets to communicated from userspace to kernel space.
I have a code in the kernel which is responsible for forwarding input IP packets to the IP stack. The module that i have written in kernle will block communication between the network driver and the IP stack. In this case the driver gives the incoming packet directly to our userspace program that is waiting for such packets.

Once these packets arrive at the userspace using netlink sockets I give to back to the kernel, where in I have a netlink socket in kernel waiting for these packets.

I have a kernel thread running which waiting for the packets from the user space.
The piece of code that waits is given below:

skb = skb_recv_datagram(nl_sk_ip,0 , 0, &err).

This thread sleeps till it gets any data from the user space. Once it gets any packet from the userspace, its only job is to inject that packet to the IP Stack for processing.

Now I ping from my machine to some other machine in the network. The ping packet goes out in the normal way. But when u get a response back, the network driver instead of giving it to the IP stack it gives to the userspace program which is listening on a raw socket. This user sapce program forms a netlink message and sends it to the kernel space netlink code. This code calls the entry function for the IP stack with the received packet. The IP stack the analysis of the packet and sends the response back in the normal way out.

The problem is, the whole setup works fine for arround 40 ICMP packets after that the "sendmsg" at the userspace return with EAGAIN (Resource temporarily unavailable) error.

Any idea why I am getting this error?
Your help in solving this would be appreciated.

Hello,
The article is very clear and understood. It describes the advantages of using netlink sockets. I suppose it might be very useful in inter processes / threads communication in user-space application. But regarding the kernel space, there are disadvantages such as:
1. Kernel recompiling, because it requires netlink.h update.
2. Because it's running in the context of sendmsg prosses, the trivial ioctl is preferred just in the reason that it's not so sophisticated.
Any comments are very welcome,
Regards,
Michael

netlink is implemented as a device like /dev/netlink on 2.4.20-8
open,read,write functions from userland to /dev/netlink actually map to socket calls.

The kernel-sidecode for netlink is under /usr/src/linux-2.4/net/netlink/netlink_dev.c
If you wish to customize, you can change the NETLINK_MAJOR to a number you like (check major.h) and compile the module separrately with a makefile like

I am a netlink newbie. I saw your comment abt the kernel space code on 2.4 kernel. I was able to compile and load the kernel module as per your suggestion. Could you tell me how I can test it, as I need a user space code.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.