GHC comes with a time and space profiling system, so that you
can answer questions like "why is my program so slow?", or "why is
my program using so much memory?".

Profiling a program is a three-step process:

Re-compile your program for profiling with the
-prof option, and probably one of the options
for adding automatic annotations:
-fprof-auto is the most common[10].

If you are using external packages with
cabal, you may need to reinstall these
packages with profiling support; typically this is done with
cabal install -p package
--reinstall.

Having compiled the program for profiling, you now need to
run it to generate the profile. For example, a simple time
profile can be generated by running the program with
+RTS
-p, which generates a file named
prog.prof where
prog is the name of your program
(without the .exe extension, if you are on
Windows).

There are many different kinds of profile that can be
generated, selected by different RTS options. We will be
describing the various kinds of profile throughout the rest of
this chapter. Some profiles require further processing using
additional tools after running the program.

Examine the generated profiling information, use the
information to optimise your program, and repeat as
necessary.

5.1. Cost centres and cost-centre stacks

GHC's profiling system assigns costs
to cost centres. A cost is simply the time
or space (memory) required to evaluate an expression. Cost centres are
program annotations around expressions; all costs incurred by the
annotated expression are assigned to the enclosing cost centre.
Furthermore, GHC will remember the stack of enclosing cost centres
for any given expression at run-time and generate a call-tree of
cost attributions.

The first part of the file gives the program name and
options, and the total time and total memory allocation measured
during the run of the program (note that the total memory
allocation figure isn't the same as the amount of
live memory needed by the program at any one
time; the latter can be determined using heap profiling, which we
will describe later in Section 5.4, “Profiling memory usage”).

The second part of the file is a break-down by cost centre
of the most costly functions in the program. In this case, there
was only one significant function in the program, namely
fib, and it was responsible for 100%
of both the time and allocation costs of the program.

The third and final section of the file gives a profile
break-down by cost-centre stack. This is roughly a call-tree
profile of the program. In the example above, it is clear that
the costly call to fib came from
main.

The time and allocation incurred by a given part of the
program is displayed in two ways: “individual”, which
are the costs incurred by the code covered by this cost centre
stack alone, and “inherited”, which includes the costs
incurred by all the children of this node.

The usefulness of cost-centre stacks is better demonstrated
by modifying the example slightly:

Now although we had two calls to fib in
the program, it is immediately clear that it was the call from
f which took all the time. The functions
f and g which are defined in
the where clause in main are
given their own cost centres, main.f and
main.g respectively.

The actual meaning of the various columns in the output is:

entries

The number of times this particular point in the call
tree was entered.

individual %time

The percentage of the total run time of the program
spent at this point in the call tree.

individual %alloc

The percentage of the total memory allocations
(excluding profiling overheads) of the program made by this
call.

inherited %time

The percentage of the total run time of the program
spent below this point in the call tree.

inherited %alloc

The percentage of the total memory allocations
(excluding profiling overheads) of the program made by this
call and all of its sub-calls.

In addition you can use the -P RTS option
to
get the following additional information:

ticks

The raw number of time “ticks” which were
attributed to this cost-centre; from this, we get the
%time figure mentioned
above.

bytes

Number of bytes allocated in the heap while in this
cost-centre; again, this is the raw number from which we get
the %alloc figure mentioned
above.

What about recursive functions, and mutually recursive
groups of functions? Where are the costs attributed? Well,
although GHC does keep information about which groups of functions
called each other recursively, this information isn't displayed in
the basic time and allocation profile, instead the call-graph is
flattened into a tree as follows: a call to a function that occurs
elsewhere on the current stack does not push another entry on the
stack, instead the costs for this call are aggregated into the
caller[11].

5.1.1. Inserting cost centres by hand

Cost centres are just program annotations. When you say
-fprof-auto to the compiler, it automatically
inserts a cost centre annotation around every binding not marked
INLINE in your program, but you are entirely free to add cost
centre annotations yourself.

The syntax of a cost centre annotation is

{-# SCC "name" #-} <expression>

where "name" is an arbitrary string,
that will become the name of your cost centre as it appears
in the profiling output, and
<expression> is any Haskell
expression. An SCC annotation extends as
far to the right as possible when parsing. (SCC stands for "Set
Cost Centre"). The double quotes can be omitted
if name is a Haskell identifier, for example:

5.1.2. Rules for attributing costs

While running a program with profiling turned on, GHC
maintains a cost-centre stack behind the scenes, and attributes
any costs (memory allocation and time) to whatever the current
cost-centre stack is at the time the cost is incurred.

The mechanism is simple: whenever the program evaluates an
expression with an SCC annotation, {-# SCC c -#}
E, the cost centre c is pushed on
the current stack, and the entry count for this stack is
incremented by one. The stack also sometimes has to be saved
and restored; in particular when the program creates a
thunk (a lazy suspension), the current
cost-centre stack is stored in the thunk, and restored when the
thunk is evaluated. In this way, the cost-centre stack is
independent of the actual evaluation order used by GHC at
runtime.

At a function call, GHC takes the stack stored in the
function being called (which for a top-level function will be
empty), and appends it to the current
stack, ignoring any prefix that is identical to a prefix of the
current stack.

We mentioned earlier that lazy computations, i.e. thunks,
capture the current stack when they are created, and restore
this stack when they are evaluated. What about top-level
thunks? They are "created" when the program is compiled, so
what stack should we give them? The technical name for a
top-level thunk is a CAF ("Constant Applicative Form"). GHC
assigns every CAF in a module a stack consisting of the single
cost centre M.CAF, where M
is the name of the module. It is also possible to give each CAF
a different stack, using the option
-fprof-cafs.