In the last several years, we have had a whole bunch of system administration, troubleshooting and hacking
articles here on Dedoimedo. Mostly related to Linux, and we learned how to fix system issues by combining the
power of methodical investigation and analysis with nifty tools and utilities. Among the many, we used strace,
lsof, oprofile, and others. Links below.

We even had a through so-called super-duper debugging guide, which combined the power of several programs all
at once in order to resolve very complex performance related problems. However, we have mostly dwelled in the
domain of userspace, less so in the kernel. In my OProfile article, someone even asked me for a practical
example where a kernel profiler could actually yield useful results. Well, now the time as come, oh-oh, to
demonstrate just that. Let us learn how to fix a seemingly unsolvable system performance issue using
perf.

Perf at a glance

Quoting the original page, Perf is a profiler
tool for Linux 2.6+ based systems that abstracts away CPU hardware differences in Linux performance
measurements and presents a simple commandline interface. Perf is based on the perf_events interface exported
by recent versions of the Linux kernel. Perf notation is somewhat similar to version control tools and strace
combined. For example, perf stat will run a command and gather performance counter statistics. Per record will
run a command and record its profile into a log file, by default perf.data. Then, perf report will read the log
file created by the perf record execution and display the profile. You may also want to analyze lock events,
define probes, and more. There's even a top mode.

All in all, I am bluffing like a pro, because it is impossible to learn perf in five minutes of reading,
especially if you do not have prior Linux knowledge, significant one at that. You will both need to understand
the Linux system behavior fairly well, and be somewhat proficient in the concepts of scheduling, memory
management, various interfaces, and CPU architecture. Only then will perf make good sense. Still, let us
explore a scenario where this tool proves its awesome merit.

Problem what we have

Say you have a Linux system with multiple users working on it, against a remote file system, e.g. NFS. One of
the users reports a certain performance degradation in accessing the network objects, like files and
directories. Other users have no such woes. As a smart lad, you immediately start a proper investigation, and
you grab your favorite tool, strace.

You execute a summary run, with the -c flag, and in both cases, the quantity of
system calls and errors, as well as their order is identical. The only difference is that a specific system
call for one user takes longer than for the others. You narrow down your tests using the -e flag, which lets you trace individual calls. With the so-called problematic user, you get
a consistently slow result for the stat system call:

There is no difference in the environment settings of the two users. They both use identical shells. Everything
is perfect except that one small difference in the system call time for stat. With the usual set of your tools,
you have just reached a dead end.

Call the cavalry

Now, let us see what perf can do for us. Let us run the test wrapped around the stat command. It is a very good way to start your investigation, because you will get a neat
summary of what happens in the kernel, which can then help point out the next lead in your troubleshooting.
Indeed, for the healthy system, we get:

There is a marked difference between the two, as you can observe. While the CPU speed is the same, and the
number of migrations, context switches and page faults are identical, the bad user spins approx. five time
longer, using more cycles and instructions and everything, resulting in more total time needed for the command
to complete. This already shows us there's something wrong afoot.

Let us explore a little deeper. Let us now record the run and then analyze the
data using the report command. This will give us a far more detailed
understanding of what really happened. So here's the report for the good user:

Understanding the report

Before we discuss the results, we need to spend a moment or two to overview the report format. The first column
indicates the percentage of the overall samples collected in the corresponding function, similar to what strace
-c reports. The second column reports the process from which the samples were collected. In the
per-thread/per-process mode, this will always be the name of the monitored command. In the CPU-wide mode, the
command name can vary.

The third column shows the name of the ELF image where the samples came from. If a program is dynamically
linked, then this may show the name of a shared library. When the samples come from the kernel, then the pseudo
ELF image name [kernel.kallsyms] is used. The fourth column indicates the privilege level at which the sample
was taken, i.e. when the program was running when it was interrupted. The following levels exist:

[.]: user level

[k]: kernel level

[g]: guest kernel level (virtualization)

[u]: guest os user space

[H]: hypervisor

The final column shows the symbol name. If you're interested in learning more about what the specific symbol
name does, you can consult The Linux Cross Reference site and search for the
specific string entry, or if you have the kernel sources available and installed for your distro, normally
under /usr/src/linux, then consult there. Don't do this for fun, please. There are better ways to spend the
afternoon.

Now, the results. We can see the difference immediately. The bad user wastes a lot of time fiddling with
rpcauth_lookup_credcache, which is linked inside the sunrpc kernel module. At this point, you have all the information you need to go to the
Internet do a very narrow and intelligent search. Just by punching the name of the symbol, you will find two or
three mailinglist references, pointing to this phenomenon, which
in turns, points to a bug.

Your problem is NOT immediately solved, but you know someone else will handle it now, with solid information in
your hand. This means reporting the issue to the community, the vendor, or whoever is in charge of your distro,
patiently waiting for a fix, or maybe searching for a workaround that might help. But the whole point is that
we used perf, which exposed information that we could not have obtained otherwise, and thus, allowed us to move
forward in our troubleshooting.

More reading

Conclusion

This is a relatively short tutorial, but then good things are simple and effective. And so is perf. Effective.
It sure is not simple. In fact, it is an advanced tool for advanced system debugging, but if you do encounter
problems where you draw a purest blank with the conventional software, you may want to consider using perf. You
just might uncover some really interesting things.

Anyhow, we are done. Hopefully, as always, you may have learned something new. Namely, how to conduct a
thorough analysis of system issues, how to compare results between good and manifestations of the system
behavior, and how to use perf. We also had a brief reminder of what strace can do for us. Now, you are that
much more of a geek. Take care.