Lay Your Unix System Bare With DTrace

Now that DTrace is available in OS X 10.5 (Leopard), I
won’t be called a Solaris bigot for praising it. Thank you, Apple. DTrace
is the most innovative software released for any Unix flavor this century. It
allows a never-before-imagined level of system visibility, enabling sysadmins,
developers, and even users to get answers to previously impossible
questions. This week we’ll talk about DTrace and how to use it, then next
week we’ll dive into a D scripting tutorial.

DTrace is a Dynamic Tracing facility, originally built into Solaris 10. It
enables both programmers and administrators to quickly identify system problems
by allowing them to look into exactly what userland programs or the operating
system is doing. DTrace has a 41-chapter manual, a large part of which explains
the usage of D, the DTrace language. Suspiciously similar to Awk, the D language provides
a method by which administrators can ask arbitrary questions of the operating
system. With more than 46,000 test points available, DTrace provides the most
flexible method on the market for diagnosis of in-depth problems. That is not to
say it’s overly complex and only useful for complex issues. In fact, the
opposite is true.

How it Works

DTrace dynamically modifies a program once it’s loaded into memory. Before
anything can execute, it must be loaded into memory. A sufficiently intelligent
tracing program, like DTrace, therefore has the opportunity to insert code into
a program before it runs. Clearly this must be run with administrative
privileges.

Before DTrace, the only way to debug an application was to recompile it with
debugging symbols enabled. This allowed a debugger to run the application, and
gather information as it ran. The resulting binary would be much larger, and
would also run much slower. DTrace can be used on any application without
recompiling it, and even without restarting it. Other user-space programs
designed to show you what system calls are being executed, like truss or strace,
actually stop the program’s execution after every system call. This creates a
huge performance problem, and it can even crash some applications. This is not a
concern with DTrace: it can be used on production systems without fear of a
crash. It uses no resources when not in use, and very little additional system
calls when activated.

User-space programs are one thing, and indeed you can get a bit of
information in some form (list of system calls) without DTrace, but finding out
what the kernel is doing was historically impossible. DTrace probes,
programmable sensors, are present in the kernel, so you can ask almost anything
you want. There are more than 40,000 probes that can be activated at will,
depending on the OS in question. A given sensor is programmed to provide the
information of value to you, and when it’s triggered, DTrace gathers the data.

A DTrace script will often ask for timestamps or arguments to
functions. A DTrace user can see how long a function call takes, how often it
executes, what the stack trace looks like, and answer many other difficult
questions.

Using DTrace

Users may want to find out certain things about their applications or the
kernel their application is running on without becoming a DTrace expert. We’ll
cover as much as possible this week without getting too deep into D programming,
for those who can benefit from just being able to run basic commands or
pre-written scripts. Next week will be all about D programming.

First, it should be noted that we can get a list of all available probes
with the command: ‘dtrace –l’. It’s not so useful unless you know what you’re
looking for, but you need to know how to find this information if you wish to
gather information using a probe that’s not part of a pre-written script.

The DTT (DTrace Toolkit) provides a suite of scripts that can provide so
much information themselves, it’s possible that some sysadmins will never
need to learn D scripting. The Docs/Contents file included in the DTT explains
what each script does. You will find that DTrace can replicate every system-wide
statistics tool you’ve ever used (think: iostat, vmstat), but it also goes
one step further. The DTT provides the most useful scripts for systems
administrators and application developers. Use tcpsnoop to see what processes
are sending what packets, or iosnoop to see what processes are writing what
files. The ability to see “what” and “how much” leaves
one speechless. Before the days of DTrace, admins were often found staring at a
terminal wondering, “what’s happening,” or “what’s
doing that.” Not any more.

Start with running DTrace yourself. The toolkit provides scripts that
make your life easier, but once you get used to DTrace, you can easily start
constructing your own.

Let’s begin by asking what system calls are taking place. In this example,
we're asking to instrument all syscall entry porints, by specifying the syscall
provider and name "entry":

dtrace -n 'syscall:::entry’
0 9299 ioctl:entry

The sample ouput line isn’t so useful, as it just shows that some process
made an ioctl() call. Something you’ll see over and over again is the command to
summarize, and list by process name. The syscall:::entry example above can be
modified to summarize what process made the most system calls:

It’s clear that this server is relatively busy running Samba, a program
called ‘save,’ and a ruby program. The standard tools such as prstat or top
should reflect that too. We’re getting into the D scripting realm, so we’ll stop
there for now.

The true power of DTrace, for overall system information, can be realized by
running the DTrace Toolkit. When you need to delve deeper into a problem,
specifically into applications themselves, you’ll need to geek out on DTrace a
bit. Come back next week for the full D language tutorial.