The -O* options specify convenient “packages” of optimisation
flags; the -f* options described later on specify
individual optimisations to be turned on/off; the -m*
options specify machine-specific optimisations to be turned
on/off.

There are many options that affect the quality of code
produced by GHC. Most people only have a general goal, something like
“Compile quickly” or “Make my program run like greased lightning.”
The following “packages” of optimisations (or lack thereof) should
suffice.

Once you choose a -O* “package,” stick with it—don't chop and
change. Modules' interfaces will change with a shift to a new
-O* option, and you may have to recompile a large chunk of all
importing modules before your program can again be run
safely (see Section 3.7.4).

No -O*-type option specified:

This is taken to mean: “Please compile quickly; I'm not over-bothered
about compiled-code quality.” So, for example: ghc -c Foo.hs

Means: “Apply every non-dangerous optimisation, even if it means
significantly longer compile times.”

The avoided “dangerous” optimisations are those that can make
runtime or space worse if you're unlucky. They are
normally turned on or off individually.

At the moment, -O2 is unlikely to produce
better code than -O.

-O2-for-C:

Says to run GCC with -O2, which may be worth a few percent in
execution speed. Don't forget -fvia-C, lest you use the native-code
generator and bypass GCC altogether!

-Onot:

This option will make GHC “forget” any -Oish options it has seen so
far. Sometimes useful; for example: make all EXTRA_HC_OPTS=-Onot.

-Ofile <file>:

For those who need absolute control over exactly
what options are used (e.g., compiler writers, sometimes :-), a list
of options can be put in a file and then slurped in with -Ofile.

In that file, comments are of the #-to-end-of-line variety; blank
lines and most whitespace is ignored.

Please ask if you are baffled and would like an example of -Ofile!

At Glasgow, we don't use a -O* flag for day-to-day work. We use
-O to get respectable speed; e.g., when we want to measure
something. When we want to go for broke, we tend to use -O -fvia-C
-O2-for-C (and we go for lots of coffee breaks).

The easiest way to see what -O (etc.) “really mean” is to run with
-v, then stand back in amazement. Alternatively, just look at the
HsC_minus<blah> lists in the GHC driver script.

Flags can be turned off individually. (NB: I hope you have a
good reason for doing this…) To turn off the -ffoo flag, just use
the -fno-foo flag. So, for
example, you can say -O2 -fno-strictness, which will then drop out
any running of the strictness analyser.

The options you are most likely to want to turn off are:

-fno-strictness (strictness
analyser, because it is sometimes slow),

-fno-specialise (automatic
specialisation of overloaded functions, because it can make your code
bigger) (US spelling also accepted), and

Should you wish to turn individual flags on, you are advised
to use the -Ofile option, described above. Because the order in
which optimisation passes are run is sometimes crucial, it's quite
hard to do with command-line options.

Here are some “dangerous” optimisations you might want to try:

-fvia-C:

Compile via C, and don't use the native-code generator. (There are
many cases when GHC does this on its own.) You might pick up a little
bit of speed by compiling via C. If you use _ccall_gc_s or
_casm_s, you probably have to use -fvia-C.

(Default: 30) By raising or lowering this number, you can raise or
lower the amount of pragmatic junk that gets spewed into interface
files. (An unfolding has a “size” that reflects the cost in terms
of “code bloat” of expanding that unfolding in another module. A
bigger function would be assigned a bigger cost.)

-funfolding-creation-threshold<n>:

(Default: 30) This option is similar to
-funfolding-interface-threshold, except that it governs unfoldings
within a single module. Increasing this figure is more likely to
result in longer compile times than faster code. The next option is
more useful:

-funfolding-use-threshold<n>:

(Default: 8) This is the magic cut-off figure for unfolding: below
this size, a function definition will be unfolded at the call-site,
any bigger and it won't. The size computed for a function depends on
two things: the actual size of the expression minus any discounts that
apply (see -funfolding-con-discount).

-funfolding-con-discount<n>:

(Default: 2) If the compiler decides that it can eliminate some
computation by performing an unfolding, then this is a discount factor
that it applies to the funciton size before deciding whether to unfold
it or not.

OK, folks, these magic numbers `30', `8', and '2' are mildly
arbitrary; they are of the “seem to be OK” variety. The `8' is the
more critical one; it's what determines how eager GHC is about
expanding unfoldings.

-funbox-strict-fields:

This option causes all constructor fields which are marked strict
(i.e. “!”) to be unboxed or unpacked if possible. For example:

data T = T !Float !Float

will create a constructor T containing two unboxed floats if the
-funbox-strict-fields flag is given. This may not always be an
optimisation: if the T constructor is scrutinised and the floats
passed to a non-strict function for example, they will have to be
reboxed (this is done automatically by the compiler).

This option should only be used in conjunction with -O, in order to
expose unfoldings to the compiler so the reboxing can be removed as
often as possible. For example:

f :: T -> Float
f (T f1 f2) = f1 + f2

The compiler will avoid reboxing f1 and f2 by inlining + on
floats, but only when -O is on.

Any single-constructor data is eligible for unpacking; for example

data T = T !(Int,Int)

will store the two Ints directly in the T constructor, by flattening
the pair. Multi-level unpacking is also supported:

data T = T !S
data S = S !Int !Int

will store two unboxed Int#s directly in the T constructor.

-fsemi-tagging:

This option (which does not work with the native-code generator)
tells the compiler to add extra code to test for already-evaluated
values. You win if you have lots of such values during a run of your
program, you lose otherwise. (And you pay in extra code space.)

We have not played with -fsemi-tagging enough to recommend it.
(For all we know, it doesn't even work anymore… Sigh.)

(SPARC machines)
Means to pass the like-named option to GCC; it says to use the
Version 8 SPARC instructions, notably integer multiply and divide.
The similiar -m* GCC options for SPARC also work, actually.

(iX86 machines)
GHC tries to “steal” four registers from GCC, for performance
reasons; it almost always works. However, when GCC is compiling some
modules with four stolen registers, it will crash, probably saying:

Foo.hc:533: fixed or forbidden register was spilled.
This may be due to a compiler bug or to impossible asm
statements or clauses.

Just give some registers back with -monly-N-regs. Try `3' first,
then `2'. If `2' doesn't work, please report the bug to us.