In Chapter 2, we looked at the basic I/O system calls in Linux. These calls form not only the basis of file I/O, but also the foundation of virtually all communication on Linux. In Chapter 3, we looked at how user-space buffering is often needed on top of the basic I/O system calls, and we studied a specific user-space buffering solution, C’s standard I/O library. In this chapter, we’ll look at the more advanced I/O system calls that Linux provides:

Scatter/gather I/O

Allows a single call to read or write data to and from many buffers at once; useful for bunching together fields of different data structures to form one I/O transaction.

Allows a process to provide hints to the kernel on its usage scenarios; can result in improved I/O performance.

Asynchronous I/O

Allows a process to issue I/O requests without waiting for them to complete; useful for juggling heavy I/O workloads without the use of threads.

The chapter will conclude with a discussion of performance considerations and the kernel’s I/O subsystems.

{mospagebreak title=Scatter/Gather I/O}

Scatter/gather I/O is a method of input and output where a single system call writes to a vector of buffers from a single data stream, or, alternatively, reads into a vector of buffers from a single data stream. This type of I/O is so named because the data is scattered into or gathered from the given vector of buffers. An alternative name for this approach to input and output is vectored I/O. In comparison, the standard read and write system calls that we covered in Chapter 2 provide linear I/O.

In addition to a reduction in the number of issued system calls, a vectored I/O implementation can provide improved performance over a linear I/O implementation via internal optimizations.

Atomicity

Unlike with multiple linear I/O operations, a process can execute a single vectored I/O operation with no risk of interleaving of an operation from another process.

Both a more natural I/O method and atomicity are achievable without a scatter/gather I/O mechanism. A process can concatenate the disjoint vectors into a single buffer before writing, and decompose the returned buffer into multiple vectors after reading—that is, a user-space application can perform the scattering and the gathering manually. Such a solution, however, is neither efficient nor fun to implement.

readv( ) and writev( )

POSIX 1003.1-2001 defines, and Linux implements, a pair of system calls that implement scatter/gather I/O. The Linux implementation satisfies all of the goals listed in the previous section.

The
readv()
function reads
count
segments from the file descriptor
fd
into the buffers described by
iov
:

#include <sys/uio.h>

ssize_t readv (int fd, const struct iovec *iov,int count);

Thewritev()
function writes at most
count
segments from the buffers described by
iov
into the file descriptor
fd
:

#include <sys/uio.h>

ssize_t writev (int fd,
const struct iovec *iov,
int count);

The readv() and writev()
functions behave the same as
read()
and
write()
, respec
tively, except that multiple buffers are read from or written to.

Each
iovec
structure describes an independent disjoint buffer, which is called a segment:

A set of segments is called a vector. Each segment in the vector describes the address and length of a buffer in memory to or from which data should be written or read. The readv()
function fills each buffer of
iov_len
bytes completely before proceeding to the next buffer. The
writev()
function always writes out all full
iov_len
bytes before proceeding to the next buffer. Both functions always operate on the segments in order, starting with
iov[0]
, then
iov[1]
, and so on, through
iov[count–1]
.

{mospagebreak title=Return values}

On success, readv()
and
writev()
return the number of bytes read or written, respectively. This number should be the sum of all
count iov_len
values. On error, the system calls return
-1
, and set
errno
as appropriate. These system calls can experience any of the errors of the
read()
and
write()
system calls, and will, upon receiving such errors, set the same
errno
codes. In addition, the standards define two other error situations.

First, because the return type is an
ssize_t
, if the sum of all
count iov_len
values is greater than
SSIZE_MAX
, no data will be transferred,
-1
will be returned, and
errno
will be set to
EINVAL
.

Second, POSIX dictates that
count
must be larger than zero, and less than or equal to
IOV_MAX
, which is defined in
<limits.h>
. In Linux,
IOV_MAX
is currently
1024
. If
count
is
0
, the system calls return
0
.* If
count
is greater than
IOV_MAX
, no data is transferred, the calls return
-1
, and
errno
is set to
EINVAL
.

Optimizing the Count

During a vectored I/O operation, the Linux kernel must allocate internal data structures to represent each segment. Normally, this allocation would occur dynamically, based on the size of count
. As an optimization, however, the Linux kernel creates a small array of segments on the stack that it uses if
count
is sufficiently small, negating the need to dynamically allocate the segments, and thereby providing a small boost in performance. This threshold is currently eight, so if
count
is less than or equal to
8
, the vectored I/O operation occurs in a very memory-efficient manner off of the process’ kernel stack.

Most likely, you won’t have a choice about how many segments you need to transfer at once in a given vectored I/O operation. If you are flexible, however, and are debating over a small value, choosing a value of eight or less definitely improves efficiency.

writev( ) example

Let’s consider a simple example that writes out a vector of three segments, each containing a string of a different size. This self-contained program is complete enough to demonstrate writev()
, yet simple enough to serve as a useful code snippet:

$ cat buccaneer.txt
The term buccaneer comes from the word boucan.
A boucan is a wooden frame used for cooking meat.
Buccaneer is the West Indies name for a pirate.

{mospagebreak title=readv() example}

Now, let’s consider an example program that uses the readv() system call to read from the previously generated text file using vectored I/O. This self-contained exam
ple is likewise simple yet complete:

Thankfully, this is not the Linux implementation: Linux implements
readv()
and
writev()
as system calls, and internally performs scatter/gather I/O. In fact, all I/O inside the Linux kernel is vectored;
read()
and
write()
are implemented as vectored I/O with a vector of only one segment.