An introduction to block device drivers

Last month, we inaugurated a column on Linux kernel programming with an article on how to write Linux device drivers without doing any kernel programming. This month we touch the kernel as we explore block device drivers.

It is customary for authors explaining
device drivers to start with a complete explanation of character
devices, saving block device drivers for a later chapter. To
explain why this is, I need to briefly introduce character devices
as well. To do that, I'll give a little history.

When Unix was written 25 years ago, its design was eclectic.
One unusual design feature was that every physical device connected
to the computer was represented as a file. This was a bold
decision, because many devices are very different from one another,
especially at first glance. Why use the same interface to talk to a
printer as to talk to a disk drive?

The short answer is that while the devices are very much
different, they can be thought of as having most of the same
characteristics as files. The entire system is then kept smaller
and simpler by only using one interface with a few
extensions.

This is fine, except that it hides important differences
between devices. For example, it is possible to read any byte on a
disk at any time, but it is only possible to read the
next byte from a terminal.

There are other differences, but this is the most fundamental
one: Some devices (like disks) are
random-access, and others (like
terminals) are sequential-access.
Of course, it is possible to pretend that a random-access device is
a sequential-access device, but it doesn't work the other way
around.

A practical effect of the difference is that filesystems can
only be mounted on block devices, not on character ones. For
example, most tapes are character
devices. It is possible to copy the contents of a raw, quiescent
(unmounted and not being modified) filesystem to a tape, but you
will not be able to mount the tape, even though it contains the
same information as the disk.

Most textbooks and tutorials start by explaining character
devices, the sequential-access ones, because a minimal character
device driver is easier to write than a minimal block device
driver. My own Linux Kernel Hackers' Guide
(the KHG) is written the same way.

My reason for starting this column with block devices, the
random-access devices, is that the KHG explains simple character
devices better than it does block devices, and I think that there
is a greater need for information on block devices right now.
Furthermore, real character device
drivers can be quite complex, just as complex as block device
drivers, and fewer people know how to write block device
drivers.

I am not going to give a complete example of a device driver
here. I am going to explain the important parts, and let you
discover the rest by examining the Linux source code. Reading this
article and the ramdisk driver
(drivers/block/ramdisk.c), and possibly some
parts of the KHG, should make it possible for you to write a
simple, non-interrupt-driven block device driver, good enough to
mount a filesystem on. To write an interrupt-driven driver, read
drivers/block/hd.c, the AT hard disk driver, and
follow along. I've included a few hints in this article, as
well.

The Heart of the Driver

Whereas character device drivers provide procedures for
directly reading and writing data from and to the device they
drive, block devices do not. Instead, they provide a single
request() procedure which is used for both
reading and writing. There are generic
block_read() and
block_write() procedures which know how to call
the request() procedure, but all you need to
know about those functions is to place a reference to them in the
right place, and that will be covered later.

The request() procedure (perhaps
surprisingly for a function designed to do I/O) takes no arguments
and returns void. Instead of explicit input and return values, it
looks at a queue of requests for I/O, and processes the requests
one at a time, in order. (The requests have already been sorted by
the time the request() function reads the
queue.) When it is called, if it is not interrupt-driven, it
processes requests for blocks to be read from the device, until it
has exhausted all pending requests. (Normally, there will be only
one request in the queue, but the request()
procedure should check until it is empty. Note that other requests
may be added to the queue by other processes while the current
request is being processed.)

On the other hand, if the device is interrupt-driven, the
request() procedure will usually schedule an
interrupt to take place, and then let the interrupt handling
procedure call end_request() (more on
end_request() later) and then call the
request() procedure again to schedule the next
request (if any) to be processed.

The first thing you notice about this function may be that it
never explicitly returns. It does not run off the end and return,
and there is no return statement. This is not a bug; the
INIT_REQUEST macro takes care of this for us. It
checks the request queue and, if there are no requests in the
queue, it returns. It does some simple sanity checks on the new
CURRENT request if there is another request in
the queue to make CURRENT.

CURRENT is defined by default as

blk_dev[MAJOR_NR].current_request

in drivers /block/blk.h. (We will cover
MAJOR_NR and blk.h later.)
This is the current request, the
one at the head of the request queue that is being processed. The
request structure includes all the information needed to process
the request, including the device, the command (read or write;
we'll assume read here), which sector is being read, the number of
sectors to read, a pointer to memory to store the data in, and a
pointer to the next request. There is more than that, but that's
all we are concerned with.

The sector variable contains the block
number. The length of a sector is specified when the device is
initialized (more later), and the sectors are numbered
consecutively, starting at 0. If the physical device is addressed
by some means other than sectors, it is the responsibility of the
request() procedure to translate.

In some cases, a command may read or write more than one
sector. In those cases, the nr_sectors variable
contains the number of contiguous sectors to read or write.

end_request() is called whenever the
CURRENT request has been processed—either
satisfied or aborted.

If it has been satisfied, it is called with an argument of 1
and, if it has been aborted, it is called with an argument of 0. It
complains if the request was aborted, does magic with the buffer
cache, removes the processed request from the queue, “ups” a
semaphore if the request was for swapping, and wakes up all
processes that were waiting for a request to complete.

It may allow a task switch to occur if one is needed.

end_request() is a static function defined
in blk.h. A separate version is compiled into
each block device driver, using special
#define'd values that are used throughout blk.h
and the block device driver. This brings us to...

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.