Network Buffers and Memory Management

Writing a network device driver for Linux is fundamentally simple—most of the complexity (other than talking to the hardware) involves managing network packets in memory.

Optional Functionality

Each device has the option of providing additional functions
and facilities to the protocol layers. Not implementing these
functions will cause a degradation in service available via the
interface, but will not prevent operation. These operations split
into two categories—configuration and activation/shutdown.

Activation and Shutdown

When a device is activated (i.e., the flag
IFF_UP is set), the
dev->open() method is invoked if the device
has provided one. This invocation permits the device to take any
action such as enabling the interface that is needed when the
interface is to be used. An error return from this function causes
the device to stay down and causes the user's activation request to
fail with an error returned by
dev->open()

The dev->open() function can also be
used with any device that is loaded as a module. Here it is
necessary to prevent the device from being unloaded while it is
open; thus, the MOD_INC_USE_COUNT macro must be
used within the open method.

The dev->close() method is invoked when
the device is ready to be configured down and should shut off the
hardware in such a way as to minimise machine load (e.g., by
disabling the interface or its ability to generate interrupts). It
can also be used to allow a module device to be unloaded after it
is down. The rest of the kernel is structured in such a way that
when a device is closed, all references to it by pointer are
removed, in order to ensure that the device can be safely unloaded
from a running system. The close method is not permitted to
fail.

Configuration and Statistics

A set of functions provide the ability to query and to set
operating parameters. The first and most basic of these is a
get_stats routine which when called returns a
struct enet_statistics block for the interface.
This block allows user programs such as ifconfig
to see the loading of the interface and any logged problem frames.
Not providing this block means that no statistics will be
available.

The dev->set_mac_address() function is
called whenever a superuser process issues an ioctl of type
SIOCSIFHWADDR to change the physical address of
a device. For many devices this function is not meaningful and for
others it is not supported. In these cases, set this function
pointer to NULL. Some devices can only perform a
physical address change if the interface is taken down. For these
devices, check the IFF_UP flag, and if it is
set, return -EBUSY.

The dev->set_config() function is
called by the SIOCSIFMAP function when a user
enters a command like ifconfig eth0 irq 11. It
then passes an ifmap structure containing the
desired I/O and other interface parameters. For most interfaces
this function is not useful, and you can return NULL.

Finally, the dev->do_ioctl() call is
invoked whenever an ioctl in the range
SIOCDEVPRIVATE to
SIOCDEVPRIVATE+15 is used on your interface. All
these ioctl calls take a struct ifreq, which is
copied into kernel space before your handler is called and copied
back at the end. For maximum flexibility any user can make these
calls, and it is up to your code to check for superuser status when
appropriate. For example, the PLIP driver uses these calls to set
parallel port time out speeds in order to allow a user to tune the
plip device for his machine.

Multicasting

Certain physical media types, such as Ethernet, support
multicast frames at the physical layer. A multicast frame is heard
by a group of hosts (not necessarily all) on the network, rather
than going from one host to another.

The capabilities of Ethernet cards are fairly variable. Most
fall into one of three categories:

No multicast filters. The card either receives all
multicasts or none of them. Such cards can be a nuisance on a
network with a lot of multicast traffic, such as group video
conferences.

Hash filters. A table is loaded onto the card
giving a mask of entries for desired multicasts. This method
filters out some of the unwanted multicasts but not all.

Perfect filters. Most cards that support perfect
filters combine this option with 1 or 2 above, because the perfect
filter often has a length limit of 8 or 16 entries.

It is especially important that Ethernet interfaces are
programmed to support multicasting. Several Ethernet protocols
(notably Appletalk and IP multicast) rely on Ethernet multicasting.
Fortunately, most of the work is done by the kernel for you (see
net/core/dev_mcast.c).

The kernel support code maintains lists of physical addresses
your interface should be allowing for multicast. The device driver
may return frames matching more than the requested list of
multicasts if it is not able to do perfect filtering.

Whenever the list of multicast addresses changes, the device
drivers dev->set_multicast_list() function is
invoked. The driver can then reload its physical tables. Typically
this looks something like:

There are a small number of cards that can only do unicast or
promiscuous mode. In this case the driver, when presented with a
request for multicasts has to go promiscuous. If this is done, the
driver must itself set the IFF_PROMISC flag in
dev->flags.

In order to aid the driver writer, the multicast list is kept
valid at all times. This simplifies many drivers, as a reset from
an error condition in a driver often has to reload the multicast
address lists.

Comment viewing options

An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?

The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).

Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.

i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.

thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.