Run your program with one of the profiling options, eg.
+RTS -p -RTS. This generates a file of
profiling information. Note that multi-processor execution
(e.g. +RTS -N2) is not supported while
profiling.

Examine the generated profiling information, using one of
GHC's profiling tools. The tool to use will depend on the kind
of profiling information generated.

5.1. Cost centres and cost-centre stacks

GHC's profiling system assigns costs
to cost centres. A cost is simply the time
or space required to evaluate an expression. Cost centres are
program annotations around expressions; all costs incurred by the
annotated expression are assigned to the enclosing cost centre.
Furthermore, GHC will remember the stack of enclosing cost centres
for any given expression at run-time and generate a call-graph of
cost attributions.

The first part of the file gives the program name and
options, and the total time and total memory allocation measured
during the run of the program (note that the total memory
allocation figure isn't the same as the amount of
live memory needed by the program at any one
time; the latter can be determined using heap profiling, which we
will describe shortly).

The second part of the file is a break-down by cost centre
of the most costly functions in the program. In this case, there
was only one significant function in the program, namely
nfib, and it was responsible for 100%
of both the time and allocation costs of the program.

The third and final section of the file gives a profile
break-down by cost-centre stack. This is roughly a call-graph
profile of the program. In the example above, it is clear that
the costly call to nfib came from
main.

The time and allocation incurred by a given part of the
program is displayed in two ways: “individual”, which
are the costs incurred by the code covered by this cost centre
stack alone, and “inherited”, which includes the costs
incurred by all the children of this node.

The usefulness of cost-centre stacks is better demonstrated
by modifying the example slightly:

Now although we had two calls to nfib
in the program, it is immediately clear that it was the call from
f which took all the time.

The actual meaning of the various columns in the output is:

entries

The number of times this particular point in the call
graph was entered.

individual %time

The percentage of the total run time of the program
spent at this point in the call graph.

individual %alloc

The percentage of the total memory allocations
(excluding profiling overheads) of the program made by this
call.

inherited %time

The percentage of the total run time of the program
spent below this point in the call graph.

inherited %alloc

The percentage of the total memory allocations
(excluding profiling overheads) of the program made by this
call and all of its sub-calls.

In addition you can use the -P RTS option
to
get the following additional information:

ticks

The raw number of time “ticks” which were
attributed to this cost-centre; from this, we get the
%time figure mentioned
above.

bytes

Number of bytes allocated in the heap while in this
cost-centre; again, this is the raw number from which we get
the %alloc figure mentioned
above.

What about recursive functions, and mutually recursive
groups of functions? Where are the costs attributed? Well,
although GHC does keep information about which groups of functions
called each other recursively, this information isn't displayed in
the basic time and allocation profile, instead the call-graph is
flattened into a tree.

5.1.1. Inserting cost centres by hand

Cost centres are just program annotations. When you say
-auto-all to the compiler, it automatically
inserts a cost centre annotation around every top-level function
not marked INLINE in your program, but you are entirely free to
add the cost centre annotations yourself.

The syntax of a cost centre annotation is

{-# SCC "name" #-} <expression>

where "name" is an arbitrary string,
that will become the name of your cost centre as it appears
in the profiling output, and
<expression> is any Haskell
expression. An SCC annotation extends as
far to the right as possible when parsing. (SCC stands for "Set
Cost Centre").

5.1.2. Rules for attributing costs

The cost of evaluating any expression in your program is
attributed to a cost-centre stack using the following rules:

If the expression is part of the
one-off costs of evaluating the
enclosing top-level definition, then costs are attributed to
the stack of lexically enclosing SCC
annotations on top of the special CAF
cost-centre.

Otherwise, costs are attributed to the stack of
lexically-enclosing SCC annotations,
appended to the cost-centre stack in effect at the
call site of the current top-level
definition[10]. Notice that this is a recursive
definition.

What do we mean by one-off costs? Well, Haskell is a lazy
language, and certain expressions are only ever evaluated once.
For example, if we write:

x = nfib 25

then x will only be evaluated once (if
at all), and subsequent demands for x will
immediately get to see the cached result. The definition
x is called a CAF (Constant Applicative
Form), because it has no arguments.

For the purposes of profiling, we say that the expression
nfib 25 belongs to the one-off costs of
evaluating x.

Since one-off costs aren't strictly speaking part of the
call-graph of the program, they are attributed to a special
top-level cost centre, CAF. There may be one
CAF cost centre for each module (the
default), or one for each top-level definition with any one-off
costs (this behaviour can be selected by giving GHC the
-caf-all flag).

If you think you have a weird profile, or the call-graph
doesn't look like you expect it to, feel free to send it (and
your program) to us at
<glasgow-haskell-bugs@haskell.org>.

[10] The call-site is just the place
in the source code which mentions the particular function or
variable.