Network Buffers and Memory Management

Writing a network device driver for Linux is fundamentally simple—most of the complexity (other than talking to the hardware) involves managing network packets in memory.

Higher Level Support Routines

The semantics of allocating and queuing buffers for sockets
also involve flow control rules and for sending a whole list of
interactions with signals and optional settings such as non
blocking. Two routines are designed to make this easy for most
protocols.

The sock_queue_rcv_skb() function is used
to handle incoming data flow control and is normally used in the
form:

This function uses the socket read queue counters to prevent
vast amounts of data from being queued to a socket. After a limit
is hit, data is discarded. It is up to the application to read fast
enough, or as in TCP, for the protocol to do flow control over the
network. TCP actually tells the sending machine to shut up when it
can no longer queue data.

On the sending side, sock_alloc_send_skb()
handles signal handling, the non-blocking flag and all the
semantics of blocking until there is space in the send queue, so
that you cannot tie up all of memory with data queued for a slow
interface. Many protocol send routines have this function doing
almost all the work:

Most of this we have met before. The very important line is
skb->sk=sk. The
sock_alloc_send_skb() has charged the memory for
the buffer to the socket. By setting skb->sk,
we tell the kernel that whoever does a
kfree_skb() on the buffer should credit the
memory for the buffer to the socket. Thus, when a device has sent a
buffer and freed it, the user is able to send more.

Network Devices

All Linux network devices follow the same interface, but many
functions available in that interface are not needed for all
devices. An object-oriented mentality is used, and each device is
an object with a series of methods that are filled into a
structure. Each method is called with the device itself as the
first argument, in order to get around the lack of the C++ concept
of this within the C language.

The file drivers/net/skeleton.c contains the skeleton of a
network device driver. View or print a copy from a recent kernel
and follow along throughout the rest of the article.

Each network device deals entirely in the transmission of
network buffers from the protocols to the physical media, and in
receiving and decoding the responses the hardware generates.
Incoming frames are turned into network buffers, identified by
protocol and delivered to netif_rx(). This
function then passes the frames off to the protocol layer for
further processing.

Each device provides a set of additional methods for the
handling of stopping, starting, control and physical encapsulation
of packets. All of the control information is collected together in
the device structures that are used to manage each device.

Naming

All Linux network devices have a unique name that is not in
any way related to the file system names devices may have. Indeed,
network devices do not normally have a file system representation,
although you can create a device which is tied to the device
drivers. Traditionally the name indicates only the type of a device
rather than its maker. Multiple devices of the same type are
numbered upwards from 0; thus, Ethernet devices are known as
“eth0”, “eth1”, “eth3” etc. The naming scheme is important as
it allows users to write programs or system configuration in terms
of “an Ethernet card” rather than worrying about the manufacturer
of the board and forcing reconfiguration if a board is
changed.

The following names are currently used for generic
devices:

ethn Ethernet controllers,
both 10 and 100Mbit/second

trn Token ring devices

sln SLIP devices and AX.25
KISS mode

pppn PPP devices both
asynchronous and synchronous

plipn PLIP units; the number
matches the printer port

tunln IPIP encapsulated
tunnels

nrn NetROM virtual
devices

isdnn ISDN interfaces handled
by isdn4linux (*)

dummyn Null devices

lo The loopback device

(*) At least one ISDN interface is an Ethernet
impersonator—the Sonix PC/Volante driver behaves in all aspects
as if it was Ethernet rather than ISDN; therefore, it uses an
“eth” device name. If possible, a new device should pick a name
that reflects existing practice. When you are adding a whole new
physical layer type, you should look for other people working on
such a project and use a common naming scheme.

Certain physical layers present multiple logical interfaces
over one media. Both ATM and Frame Relay have this property, as
does multi-drop KISS in the amateur radio environment. Under such
circumstances, a driver needs to exist for each active channel. The
Linux networking code is structured in such a way as to make this
manageable without excessive additional code. Also, the name
registration scheme allows you to create and remove interfaces
almost at will as channels come into and out of existence. The
proposed convention for such names is still under some discussion,
as the simple scheme of “sl0a”, “sl0b”, “sl0c” works for
basic devices like multidrop KISS, but does not cope with multiple
frame relay connections where a virtual channel can be moved across
physical boards.

Comment viewing options

An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?

The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).

Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.

i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.

thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.