Linux currently provides about 200 different system calls. A listing of system calls for your version of the Linux kernel is in /usr/include/asm/unistd.h. Some of these are for internal use by the system, and others are used only in implementing specialized library functions. In this sample chapter, authors Jeffrey Oldham and Mark Mitchell present a selection of system calls that are likely to be the most useful to application and system programmers.

This chapter is from the book

This chapter is from the book

So far, we've presented a variety of functions that your program can
invoke to perform system-related functions, such as parsing command-line
options, manipulating processes, and mapping memory. If you look under the hood,
you'll find that these functions fall into two categories, based on how
they are implemented.

A library function is an ordinary function that resides in a
library external to your program. Most of the library functions we've
presented so far are in the standard C library, libc. For example,
getopt_long and mkstemp are functions provided in the C
library.

A call to a library function is just like any other function call. The
arguments are placed in processor registers or onto the stack, and execution is
transferred to the start of the function's code, which typically resides in
a loaded shared library.

A system call is implemented in the Linux kernel. When a program
makes a system call, the arguments are packaged up and handed to the kernel,
which takes over execution of the program until the call completes. A system
call isn't an ordinary function call, and a special procedure is required
to transfer control to the kernel. However, the GNU C library (the
implementation of the standard C library provided with GNU/Linux systems) wraps
Linux system calls with functions so that you can call them easily. Low-level
I/O functions such as open and read are examples of system
calls on Linux.

The set of Linux system calls forms the most basic interface between
programs and the Linux kernel. Each call presents a basic operation or
capability.

Some system calls are very powerful and can exert great influence on
the system. For instance, some system calls enable you to shut down the Linux
system or to allocate system resources and prevent other users from accessing
them. These calls have the restriction that only processes running with
superuser privilege (programs run by the root account) can invoke them. These
calls fail if invoked by a nonsuperuser process.

Note that a library function may invoke one or more other library functions
or system calls as part of its implementation.

Linux currently provides about 200 different system calls. A listing of
system calls for your version of the Linux kernel is in
/usr/include/asm/unistd.h. Some of these are for internal use by the
system, and others are used only in implementing specialized library functions.
In this chapter, we'll present a selection of system calls that are likely
to be the most useful to application and system programmers.

Most of these system calls are declared in <unistd.h>.

8.1 Using strace

Before we start discussing system calls, it will be useful to present a
command with which you can learn about and debug system calls. The
strace command traces the execution of another program, listing any
system calls the program makes and any signals it receives.

To watch the system calls and signals in a program, simply invoke
strace, followed by the program and its command-line arguments. For
example, to watch the system calls that are invoked by the hostname1
command, use this command:

% strace hostname

This produces a couple screens of output. Each line corresponds to a single
system call. For each call, the system call's name is listed, followed by
its arguments (or abbreviated arguments, if they are very long) and its return
value. Where possible, strace conveniently displays symbolic names
instead of numerical values for arguments and return values, and it displays the
fields of structures passed by a pointer into the system call. Note that
strace does not show ordinary function calls.

In the output from strace hostname, the first line shows the
execve system call that invokes the hostname program:
2

execve("/bin/hostname", ["hostname"], [/* 49 vars */]) = 0

The first argument is the name of the program to run; the second is its
argument list, consisting of only a single element; and the third is its
environment list, which strace omits for brevity. The next 30 or so
lines are part of the mechanism that loads the standard C library from a shared
library file.

Toward the end are system calls that actually help do the program's
work. The uname system call is used to obtain the system's
hostname from the kernel,

uname({sys="Linux", node="myhostname", ...}) = 0

Observe that strace helpfully labels the fields (sys and
node) of the structure argument. This structure is filled in by the
system call—Linux sets the sys field to the operating system name
and the node field to the system's hostname. The uname
call is discussed further in Section 8.15, "uname."

Finally, the write system call produces output. Recall that file
descriptor 1 corresponds to standard output. The third argument is the number of
characters to write, and the return value is the number of characters that were
actually written.

write(1, "myhostname\n", 11) = 11

This may appear garbled when you run strace because the output from
the hostname program itself is mixed in with the output from
strace.

If the program you're tracing produces lots of output, it is sometimes
more convenient to redirect the output from strace into a file. Use the
option -o filename to do this.

Understanding all the output from strace requires detailed
familiarity with the design of the Linux kernel and execution environment. Much
of this is of limited interest to application programmers. However, some
understanding is useful for debugging tricky problems or understanding how other
programs work.