Profiling in Linux HOWTOJohnLevonmoz@compsoc.man.ac.uk2002John Levon0.1FIXMEInitial versionThis document can be freely translated and distributed. It's released
under the LDP License.LinuxProfilingperformanceoptimisationoptimizationgprofIntroductionProfiling code
As systems get more complex, the need for
machine-assisted performance analysis grows. Kernighan and
Pike note Measurement is a crucial component of
performance improvement since reasoning and intuition are fallible guides and
must be supplemented with tools like timing commands and profilers.tpop.
Linux is generally well-served in terms of development tools, and there are a
wide selection of profiling packages available. However, several of these
tools are not well-publicised, and many are under-documented. To date, there
has been no comprehensive survey of the choices available: this document hopes
to fill this gap.
It is worth mentioning some guidelines that should be followed when doing
performance analysis. Probably the number one rule is : analyse
the results. Think about what the results could be implying; don't take them
at face value. Consider whether the profiling technique could be harming
the accuracy of the profiling data.
Pay close attention to your profiling environment. Are you running realistic
tests ? Have you avoided narrowing in on a particular workload at the expense
of the common case ? Amdahl's law indicates the analyst should avoid focussing
on a small part of the system, until it is ascertained that optimisation will
benefit the common case.
Are you profiling production code ? Performance analysis of unoptimised code
peppered with debug statements carries the risk of mis-optimisation. Make sure
your optimisation decisions are governed by realistic data, not intuition.
Everyone knows Knuth's famous maxim Premature optimisation is the root
of all evil, but it is still ignored all too frequently (this maxim
is similar to the Extreme Programming rule "you aren't going to need it").
Too much developer time is spent optimising code that doesn't need optimisation.
This leaves open the question as to when the right time to do performance
analysis is. Commonly this is done during the alpha or beta phase of a release's
lifecycle, and often in parallel with unstable development for far-reaching
changes. This can prove to be a problem with a development tree in high flux,
as profiling data can quickly become outdated - this is, of course, a development
management issue, and need not concern us here.
When you identify a bottleneck in your program, there are two principal ways
to view it. First, it can be considered on the procedural level: this is the sort
of analysis that leads to, for example, inner loop optimisation, inlining decisions,
and other such transformations. Second, an architectural point of view can be taken: here
the underlying algorithms are considered; why does the particular algorithm used
not work efficiently enough for the important cases, and how can the system be
re-worked to fix this.
Both points of view are of use, though it is probably fair to say that the
architectural considerations are more important. Re-workings on this level
more often than not lead to more significant gains than procedural analysis,
although they are offset by higher development costs. Procedural analyses are
most useful when tweaking the performance of a system approaching the end of a
release cycle, and are generally cheap to implement. The majority of premature
optimisation is a result of procedural changes guided by intuition. Procedural
changes often makes code harder to read; this accretion of junk code can
easily turn into a significant maintenance burden, especially with large
projects. In general the developer should avoid making micro-optimisations that
could affect code readability until they have proven their worth in extensive
analysis work.
About this document
This HOWTO describes the methods and software a developer can use for
performance analysis on the Linux platform. This document in general focuses
on the Linux/x86 platform, although much of it applies to other architectures
as well.
is a brief survey of the basic methods which are used
for profiling. If you already have some familiarity with the basic profiling
terminology, you may skip this section.
discusses the kernel and user space facilities found
in the Linux environment that provide support for profilers. You can probably
skip this section if you're not interested in profiler implementations.
provides an overview of the available profilers
on Linux, providing brief synopses of their implementation, and considering their
relative merits.
The bibliography at the end of this document collects several relevant websites,
articles, and research papers, and an exhaustive list of the software
described in .
This document is actively maintained; please contact the author with any
suggestions, corrections, or confusions. Note that the primary author of this
document is also the project lead for oprofile, though all
efforts have been made to provide a disinterested review.
Profiling techniques
There has been a large number of profiler implementations, and there is
a significant body of literature on performance analysis. This chapter
briefly covers some of the terminology used in ,
and describes some profiler design parameters.
Design aims of a profiler
As an important part of a programmer's artillery, a profiler should avoid
getting in the way of the human analyst. This leads to a number of design
parameters every profiler should aim towards :
Unobtrusive
A profiler should not require a significant expenditure of
developer effort. The need for recompiles, preprocessors,
special modifications to the toolchain and the like should
be avoided, as they are inconvenient to the developer. An
ideal solution should allow profiling at will, without needing
such changes.
Accurate
The data and reports generated by the profiler system should
aim towards accuracy. Inaccurate data runs the risk of mis-informing
the developer of the true situation, leading to wasted effort
and maintainability problems.
Complete
Profilers should aim towards a complete data set. If a system
component or facet is not represented in the results, the
developer may not be aware of its impact on the system as a
whole.
General
Profilers should avoid special-purpose techniques where possible.
Fast
If the profiling method is too slow, it will impinge on the
developer's hacking time. Slow profilers often can't be
used in realistic environments, which makes collecting
meaningful data hazardous and prone to error.
Profile data
Profile data covers a wide range of data types, including event logs,
execution counts, resource attributions, and more. Any tool that can
generate data as input to a performance analysis can, in some sense,
be considered to be a profiling technique. By definition, profiling data
must be collected at runtime; this fact restricts the available methods
to a few main techniques.
Profile data can be produced in a number of different forms. At the most simple end
are accumulated event counts which can be used for a broad understanding
of the workloads. Event logging, which is a form of tracing, is another
related form. Generally event logs require some form of processing in order
to reveal interesting performance data. Time-based data characterises
how long operations, or sections of code, execute for, typically measured
in real time, or virtual per-process CPU time. Call-graph
data collects data with regards to the path to the code under question. For example,
a periodic stack trace is a simple form of call-graph information. More typically,
call-graph information is represented in an accumulated form at function granularity.
This allows the developer to determine more easily the focal point of
any performance problems in the source code.
All such data can be classified as either exact
or statistical. Exact data tells the whole story:
no elements are missing from the data. For example, function call counts are
usually calculated at every function call, so the
total counts are 100% accurate.
Statistical data, in contrast, is not 100% accurate. Rather, for the data to be
useful, it is expected that it is a realistic representation of the true data set.
The data set is some fraction
of the data that would have been generated by the profiler input. For example,
a CPU time histogram of functions is rarely exact. Estimating time spent in each function
by sampling the PC counter regularly is a very common profiling technique. There are two
sub-types of statistical data. First there is data that is inaccurate due to the inherent
uncertainty of certain measurements: for example, a cache line miss data point may
accurately represent the number of actual cache line misses, but the lack of context
means that some misses due to other system processes are not filtered out from
the result set. A more common example is the granularity of certain timing tools.
The second type of inaccuracy is usually a result of examining profiling methods,
and the inherent limitations of their resolution. For example, a profiler that
accounted the function being executed every 10ms could easily skew the results
in favour of functions that take longer, even when there are faster functions
that are called far more often.
One of the main reasons statistical profiling is so common is that collecting exact
data often incurs a cost in overhead, and often that cost is prohibitive. Thus
this design choice is a tradeoff between speed/obtrusiveness and accuracy.
We have mentioned examining profilers.
These constitute one of the main classes of profiling techniques. They are characterised
by a periodic collection of profiling data. This technique inevitably gives
statistically-bound results, unless an accounting technique
is used in concert with the periodic collection. An accounting profiler
collects exact counts for some particular data item, for example, number of
major page faults. The exact nature of accounting profilers implies more reliable
data, but there can often be costs in terms of obtrusiveness of the technique used.
Instrumentation methods
Commonly, the target application must include some instrumentation
to enable the profiling mechanisms to operate. Sometimes only preliminary
start-up code needs to be added, and this is easily acheived via mechanisms
such as LD_PRELOAD. Accounting profilers often need to add instrumentation
at a fine-grained level, and there are a number of different techniques in use :
Simulation
A simulator can easily collect detailed data as part of the simulation
run. Such techniques tend to be very obtrusive and slow, so are best
used when the level of detail is critical.
Source-level instrumentation
Source-level instrumentation involves altering the source code
that eventually becomes the application by inserting profiling code.
This can happen semi-automatically via a pre-processor, or may
require a programmer to add explicit calls to some profiling API.
Compile-time instrumentation
The compiler itself can be used to insert profiling code. This has
the advantage above source-level instrumentation of being more convenient,
but of course requires the source code to be recompiled, which is
not always practicable.
Offline binary instrumentation
Binary images that contain the text sections for shared libraries
or applications can be rewritten to add instrumentation. This technique
is complex to implement, but is relatively unobtrusive unless system-wide
performance data is needed.
Online binary instrumentation
Mapped binary images are rewritten to add instrumentation. To some degree,
just-in-time compiling environments are in this class of techniques.
Related tools
Profiling is amongst a class of runtime program examination techniques which
also includes tracing and runtime debugging. Tracing is very similar to
profiling, but differs in focus. Tracing is most commonly used as a method of
examining program logic, rather than application performance. Tools such as
strace, ltrace, Electric Fence, and
garbage-collection in leak trace
mode exemplify typical tracing systems. However, event-based profiling is
concerned with examination of particular event data, so is strongly related to
tracing. Runtime debugging utilties such as gdb are only used for examining
program logic on a detailed level.
Where these methods coincide with profiling is mainly in the implementation
techniques used. Function call counts can be implemented with the same basic
mechanism as tracing utilities; instrumentation is another technique commonly
used throughout this area. Both tracing and debugging are complex areas, and
deserve separate discussion of their own.
Support mechanismsHardware support mechanisms
CPU manufacturers recognised several years ago that profiling was increasing
in importance, and as a result many CPUs, such as the MIPS R10000, the Alpha/AXP,
the Intel Pentium series, and more, provide at least some hardware support to
assist a software-based profiler. At one extreme, bolt-on hardware has been
produced to assist in profiling, for example profileme.
One of the simplest things a CPU can provide is a high-resolution timestamp
counter such as the Pentium's TSC. This allows interstitial timing harnesses
for measuring operation latency to a high degree of accuracy.
At the next level of complexity there are performance counters. These are typically
registers that count events of interest such as cache line misses. The benefits of such
counters are well knownmipsr10000monitor:
actual data from the hardware that can be attributed to sections of source code
removes a lot of the black magic previously associated with performance analysis.
Typically software using such counters either periodically check the value of
the counter, or, if possible, use counter overflow events to generate an interrupt,
which then logs the overflow event against the currently executing code.
More recent architectures have gone even further in terms of support, providing
much of the data collection machinery in hardwareia64ia32.
Kernel support mechanisms
Many UNIX systems support the profil(2) system call. This is
an examining profiler that forms part of the kernel. The timer interrupt, or some
other periodic timer, collects the PC value at the time of interrupt, and stores
this in the relevant bin in a histogram buffer supplied by user space. This simple
technique is reasonably fast, but has issues with resolution; also it is inflexible.
Linux does not implement profil(2), preferring a user space
solution (see ). For reference, version 4 of the
GNU C library included a patch for a kernel implementation of
profil(2).
The kernel provides the necessary support for POSIX interval timers, via
setitimer(2). The timer type ITIMER_PROF
counts both user-space time and the time the target process spends in the
kernel, and delivers a SIGPROF signal on expiration.
A profiler may install a signal handler for SIGPROF
and use the si_addr field of the
siginfo_t structure to collect a PC value histogram.
Unfortunately this technique is low-resolution, and the use of signals
can cause problems with profiler overhead.
The IA-64 port provides an interfaceia64
to the hardware performance mechanisms
with the perfmonctl(2) system call. The standard IA-32 kernel
features drivers for user-space access to the machine-specific registers,
which can be used to set up the hardware performance counting mechanisms
ia32,athlon.
Linux kernels from 2.5.43 onwards provide the OProfile profiler interface, discussed later.
Text-format information is available for every process in the system via the
/proc file system, with a directory
for each process named by its process ID. You can collect page fault data,
memory usage data, and similar statistics from these files, which may be
useful for characterising performance.
The /proc file formats are mostly described
in the proc(5) manpage (make sure you have a recent
man-pages package installedmanpages).
When in doubt, look in
the kernel source (fs/proc/), and
the source for top(1), ps(1), etc.
Compiler support mechanisms
As mentioned previously, an instrumenting profiler needs to modify the
profiled code. Doing this at compile-time is one reasonable method: it is
simple to implement, and can provide exact data. Its main drawback is
the inconvenience of recompilation of the target code, and the risk of skew
as a result of the introduction of profiling code.
The GNU C compiler provides a small number of mechanisms which a profiler can
use to support itself. Using the

-pg

to gcc,
the compiler will insert calls to _mcount() into each
function prologue (for details see final.c:profile_function()
in the gcc sources). This function is eventually supplied
by the C library, and collects the from and to PC values into a data structure,
which can then be used to construct call-graph information. The same mechanism
is used with the

-a

option, which is intended to allow
basic-block profiling, although it is reputed to work poorly
or not at all in a large number of cases.
The GNU C compiler provides another mechanism that can be used for profiling, with
the

-finstrument-functions

optiongcc.
This will generate references at the start and end of each function to the following
functions :
void __cyg_profile_func_enter(void (*fn)(), void (*parent)());
void __cyg_profile_func_exit(void (*fn)(), void (*parent)());
You can implement these functions, and use the function pointer values to
construct profiling data. Typically, a profiler would use the PC values
passed to look up the function names in the binary image, so a user-readable
call-graph report can be generated. Note that these are weak symbols so profiling
via this method can be done via LD_PRELOAD.
GCC provides increasing support for profile-directed optimization. This technique
uses program profile data in order to guide compilation decisions, in the hope
that the compiled program will behave similarly, improving overall performance.
This feature is enabled by the

-fprofile-arcs

option,
which then produces a .da profile, containing arc traversal
data (in this context, an arc represents a program branch to a basic block,
a straight-line section of code). This can then be fed back in for a second
compile run, this time additionally using

-fbranch-probabilities

.
See the GCC manual and gccprofiledriven for more information.
Library support mechanisms
The GNU C library provides a user-space implementation profil(3),
which internally uses setitimer(2) with ITIMER_PROF
to populate the PC value histogram. The times(2) and
getrusage(2) library calls allow collection of some data
that may prove relevant to a performance analysis.
The GNU C library provides hooks into its memory allocation routinesglibc.
You can use these hooks in order to collection allocation lifetime data, size distributions
etc. Particularly for object-oriented code, allocation can become a crucial part of
an application's performance, and sometimes it is necessary to fine-tune the application's
behaviour in this respect.
Dietlibcdietlibc supports ITIMER_PROF for
setitimer(2), but does not implement profil(3)
as of this writing.
Available profiling packagesKernel profilersreadprofile
FIXME: hot-profile as in src/
The standard kernel profiler (as used by readprofile(1)) is rather rudimentary
but commonly used as it comes packaged with the kernel for most architectures; at
the time of writing, only the cris, s390 and parisc ports don't implement it.
It is a simple statistical profiler that stores the kernel PC value into a scaled
histogram buffer on every timer tick. Note that only the kernel image is profiled;
no user-space, no kernel modules etc. It is also incapable of profiling code where
interrupts are disabled.
The profiler requires a reboot with the option "profile=2" to enable it. The value
passed is the multiplier, which determines the spatial resolution of the histogram,
as described in the man page. The histogram can be cleared by simply writing to
the profile buffer device /proc/profile. Some architectures
allow you to set the sampling frequency by writing a value to this file. For example
on x86, writing a value will set the APIC timer appropriately (lower values
mean more frequent interrupts).
When some profile data is collected, readprofile(1) can be used to print out
a simple function-based summary, as shown in this excerpt :
...
743 kmalloc 1.6294
491 handle_IRQ_event 5.3370
371 __rdtsc_delay 13.2500
348 kmem_cache_alloc 0.8878
...
Each line consists of the number of samples against
that function, the name of the function, and the normalised load. The normalised load
is calculated by (raw_count(f)/size_in_bytes(f)) - the idea is that you can expect
more samples against larger functions. However due to a number of issues, this normalisation
isn't particularly useful.
Red Hat ship a simple patch in their kernels to allow readprofile(1)
to use NMI interrupts. NMI interrupts cannot be disabled by the IF
bit in eflags; this means that you can get profile data from interrupt handlers
and code that runs with interrupts disabled (such as code protected by
spin_lock_irq()). Note that it only works in this mode
when using the NMI watchdog facility, as it relies on the watchdog code to generate
the NMIS. The patch can be found
here.
The user-space readprofile(1) utility has some incompatibilities
with recent Linux kernels: if you are having problems, it is recommended that you
upgrade to a recent util-linux version. The problem is related to mis-parsing of
vmlinux's nm output.
A patch to enable readprofile(1) to profile kernel modules
can be found here. (Note that I have not
tested if this patch works, and it is bit-rotted).
kerneltopkerneltop is a simple modification of readprofile that displays the counts in a
top style, clearing the counters at each iteration. Can be
very useful for observing the time spent in the kernel as a certain operation
is undergoing.
minilopminilop is another readprofile-derived
utility. It adds a feature to show disassembly for each histogram bin of code
that has a sample against it, and calculation of the relative load, as well
as remembering the peak count for each entry. This project seems to be abandoned.
Kernprof
SGI's kernprofkernprof
is a powerful kernel-only profiler for the ia32, ia64,
sparc64 and mips64 architectures. It cannot profile modules or interrupts-disabled code.
Profiling can be enabled and controlled at runtime.
It comes in the form of a large kernel patch and associated user-space tools. Some users
may find they need to apply a patch to gcc before the profiler can
be built.
Kernprof
supports a number of different profiling techniques. Its simplest mode creates a PC value
histogram for the kernel. Both standard timer-interrupt based sampling, and sampling based
on the hardware performance counters, are supported (use of the hardware counters is
not supported on all systems). Allowing the use of the performance counters gives a significant
power to kernprof, as relevant performance events such as cache misses can
be analysed.
Kernprof also supports a number of other profiling modes. The kernel can be built with
support for collecting (annotated) call graphs, although this has a significant overhead.
There is also the ability to collect exact function call counts via
mcount(). Some of these modes can be combined in order to improve
the information collected without impinging on performance too badly.
Most of these modes generate their data in gprof's
gmon.out format. Per-CPU profiles can be created, which
can prove useful for analysing SMP performance.
Note that because the sampling method cannot be triggered whilst interrupts are disabled,
results must be taken with a pinch of salt in some cases. In particular, if the hardware
counters are used, events will still be counted, but will have a tendency to appear
in the profiles at code points directly after interrupts become re-enabled.
However, a patch to enable NMI-based profiling for kernprof
can be found here.
OProfileLinux Trace ToolkitDynamic probesKIPTimepegsInterrupt latency measurementPre-emption latency measurementSGI LockmeterI/O statisticsCacheinfoMCT
MCTMCT is a very simple test harness useful
for comparing the low-level performance characteristics of
kernel mutual exclusion primitives.
StrongARM profilerBinary profilersVProfEazel profValgrind
Valgrindvalgrind is an excellent debugging
system that simulates an x86 in order to catch memory allocation
and access errors. The source code indicates some preliminary
support for PC value and memory access profiling, that must
be explicitly enabled at compile time.
OProfileJiTI86tsprofPaderborn sprofSource/compile-time profilersgprofPerfctrEazel profiler (cprof)FunctionCheckHigh-resolution ProfilerGNU sprof
This is unrelated to the other tool named sprof.
GNU sprof is packaged with the GNU C library
(it can often be found as part of the
Performance Counter LibraryPerformance APITAULow-fat ProfilerErik Hendriks' performance counter package
This code provided virtualised access to the x86 performance counters in a
similar manner to for 2.2 kernels. It is no longer
maintained, and is listed here only for historical interest. You can find
the manual and the source
here.
bprofbprof is a very old tool that provided instruction-level
profiling data via setitimer(2). If you are lucky, you
will still be able to find a source RPM at rpmfind.net,
but the code is merely of historical interest now.
Analysis toolsProfileviewerKProfcgprofSpecialised profilersCacheprofFireprofileAllocation profilers
A number of projects exist for debugging allocation problems such as leaks
and invalid accesses. Some of these can produce allocation statistics
that may be useful for later analysis
(dmalloc,
mpatrol,
mpr,
MemProf).
Language-specific profilersJava performance analysisPHP profilersPython profilersTCL/Tk profilers
TCL/Tk comes with a built-in profiler, as described
here.
Lisp profilers
http://www.cons.org/cmucl/doc/index.html, see biblio
Ruby profiler
The language Ruby comes with its own profiling system. Its use is briefly
covered here.
ProKylix
ProKylixkylix provides a profiler for Kylix code. Not free software.
Devel::DProf (Perl)
Perl comes with a profiling package called Devel::DProf,
described here.
SummaryArticles and research papersshendeProfiling and Tracing in LinuxSameerShende
Appears in USENIX '99 Extreme Linux Workshop, no longer available
on the web. A short and outdated introduction to Linux profiling
linuxjournalTake control: gprof, bprof and Time ProfilersAndyVaught1998-05-01
An old and brief article on profiling in Linux
profilingphp
Improving Performance by Profiling PHP Applications
hprofarticle
Diagnose common runtime problems with hprof
Short article on profiling Java with hprof
mipsr10000
Performance Analysis Using the MIPS R10000 Performance CountersMarcoZagha et al.
An interesting paper on using hardware performance counters on MIPS
profilemeFIXMEmonitor
Using Hardware Performance Counters to Isolate Memory Bottlenecks
BryanBuckJeffreyHollingsworth
A paper on using performance counters for finding performance problems
gccprofiledriven
Infrastructure for Profile-Driven Optimizations
The GCC team
A short news items on continuing efforts to provide GCC compiler optimizations
based on program profiles.
Manuals and documentationgprof
gprof(1)The manual for GNU gprofmanpagesThe Linux manpages collectionLinux manpages. The distributed package often has
updates that are not in your distribution's package, so make sure to
check the latest version of this package if you need to refer to a
man page.
gcc
gcc(1)The user manual for gccgcc-internal
GCC InternalsThe manual describing GCC internalsglibc
GNU C Library ManualThe extensive manual for glibcia32
The IA-32 Architecture Developer's ManualIntel CorporationVolume 3 of this manual describes the
performance counter mechanisms for the Pentium Classic, the P6 family,
and the Pentium 4 CPU familiesia64
IA-64 Linux Kernel: Design and ImplementationDavidMosbergerStephaneEranianBrucePerens0-13-061014-3Prentice Hall2002-30-01athlon
AMD Athlon Processor x86 Code Optimization GuideAMD CorporationA brief description of the performance counters
for AMD Athlon and Duron processorsjvmpi
Java Virtual Machine Profiler InterfaceSun CorporationThe Java API for collecting performance data from
a virtual machineLinux softwarekernprofKernprofKernel profiler patch that runs on a number
of architectures
oprofileOProfile
Performance counter based system-wide statistical profiling for x86 Linux
systems
kipKIP
Detailed tracing/logging of the kernel via source instrumentation
timepegsTimepegs
Interstitial time measurement of the kernel via source instrumentation
lttLinux Trace Toolkit
Kernel and part-userspace event logging and tracing system
dprobes
Dynamic probes
A powerful tracing and event notification system
schedlatSchedlat
Measures kernel scheduling latency
preemptstatsPre-empt statistics
Another scheduling latency measurement tool for the kernel
intlatIntlat
Measures the time the kernel has interrupts disabled
lockmeterSGI Lockmeter
Detailed statistics on mutual exclusion primitive usage in the kernel
sarSAR patches
Old patches implementing more detailed I/O accounting. Also see
http://perso.wanadoo.fr/sebastien.godard/
for user-space tools
reqlogReqlog
Logging for I/O requests in the kernel
cacheinfoCacheinfo
Module to provide statistics on the various kernel caches
mctMCT
Test harness for comparing different kernel mutual exclusion primitives
strongarm
StrongARM profiler
System-wide statistical kernel profiler for the StrongARM CPU series
kerneltop
Kerneltop
Periodic profile statistics for the kernel based on /proc/profile
minilop
Mini Linux Optimizing Project
Another /proc/profile based utility, with disassembly support
profileviewer
ProfileViewer
Java-based viewer for gprof output
kprofKProf
A KDE-based profile viewer for gprof and Function Check
cgprofcgprof
A utility to display call graphs from gprof data
perfctrPerfctr
A library providing virtualised access to the x86 hardware performance
counters
cacheprofCacheprof
A simulation-based cache impact profiler
valgrindValgrind
An excellent allocation debugger for x86 binaries with some profiling support
fireprofilerFireProfiler
Produces data on MySQL queries an application makes
vprof
VProf
Binary profiler that can use x86 performance counters
eazelEazel profilers
Two simple profilers, one instrumenting (based on Corel's defunct cprof),
and one not
functioncheck
FunctionCheck
Instrumenting accounting function-based profiler
http://hrprof.sourceforge.net/HRProf
Realtime instrumenting profiler that uses the x86 TSC register
jiti86JitI86
Offline binary instrumentation system
pctPerformance Counter Library
Userspace library API for accessing hardware counters over a wide range
of platforms
papiPerformance API
Another attempt at a platform-independent hardware counter API
tauTAU
C++-based source instrumentation package on top of PCL or PAPI
lfpLow-fat Profiler
A simple API to provide interstitial TSC-based timing analysis
apictimers
APIC timer module for Linux
Kernel module for access to high-resolution APIC timers
tsproftsprof
Binary profiler with access to the x86 performance counters. Not free software
Paderborn sprofsproftool
Portable profiler utilising hardware performance counters. Not free software
kylixProKylix
Kylix profiler
jprobe
JProbe Profiler
Profiler and analysis tools for Java. Not free software
optimizeit
OptimizeIt suite
Profiler and analysis tools for Java. Not free software
javaprofilingtoolJava Profiling Tool
Supposed profiling tool via JVMPI. No code available
xdprofxdProf
A stack trace collection facility for Java via JVMPI
sourcetracerSourceTracer
Another JVMPI-based Java profiler
jperfanalJPerfAnal
Post-profile viewer for Java
jmpJMP
Java memory allocation profiler via JVMPI
jmocha
jMocha micro-benchmark suite
Measurement harness for detailed micro-analysis
hpjmeter
HPjmeter Performance Analysis Tool
Profiler viewer for Java. Not free software
jlouissjLouiss
Tracing tool for Java via JVMPI
phpapdAdvanced PHP Debugger
PHP debugger that can generate profile data
califaCalifa
Simple profiler for PHP 3
lispdebugLisp Debug
Lisp debugger and profiler that can run on several LISP implementations
pyprofPyProf
Convenient wrapper for the Python profiler
Related linkslinuxperfLinux Performance Tuning
A portal for various performance analysis and tuning tools
javaperfJava Performance Tuning
Portal for Java performance tuning and analysis
devtoolsLinux Development Tools
Portal for various debugging and development tools under Linux
tpopThe Practise
of ProgrammingBrianWKernighanRobPikeAddison-Wesley, Inc.19990-201-61586-X
An excellent practical guide to program development