9. Porting GHC

This section describes how to port GHC to a currenly
unsupported platform. To avoid confusion, when we say
“architecture” we are referring to the processor, and
we use the term “platform” to refer to the combination
of architecture and operating system.

9.1. Booting/porting from C (.hc) files

Bootstrapping GHC on a system without GHC already
installed is achieved by taking the intermediate C files (known
as HC files) from another GHC compilation, compiling them using gcc to
get a working GHC.

NOTE: GHC versions 5.xx were hard to bootstrap
from C. We recommend using GHC 6.0.1 or
later.

HC files are platform-dependent, so you have to get a set
that were generated on the same platform.
There may be some supplied on the GHC download page, otherwise
you'll have to compile some up yourself.

The following steps should result in a working GHC build
with full libraries:

Make a set of HC files. On an identical system with
GHC already installed, get a GHC source tree and put the
following in mk/build.mk:

Build GHC as normal, and then make
hc-file-bundle Project=ghc to creates the tar file
containing the hc files.

On the target system, unpack the HC files on top of a
fresh source tree (make sure the source tree version matches
the version of the HC files exactly!).
This will place matching .hc files next
to the corresponding Haskell source
(.hs or .lhs) in
the compiler subdirectory ghc/compiler
and in the libraries (subdirectories of
libraries).

The actual build process is fully automated by the
hc-build script located in the
distrib directory. If you eventually
want to install GHC into the directory
dir, the following
command will execute the whole build process (it won't
install yet):

$ distrib/hc-build --prefix=dir

By default, the installation directory is
/usr/local. If that is what you want,
you may omit the argument to hc-build.
Generally, any option given to hc-build
is passed through to the configuration script
configure. If
hc-build successfully completes the
build process, you can install the resulting system, as
normal, with

$ make install

9.2. Porting GHC to a new platform

The first step in porting to a new platform is to get an
unregisterised build working. An
unregisterised build is one that compiles via vanilla C only.
By contrast, a registerised build uses the following
architecture-specific hacks for speed:

Global register variables: certain abstract machine
“registers” are mapped to real machine
registers, depending on how many machine registers are
available (see
ghc/includes/MachRegs.h).

Assembly-mangling: when compiling via C, we feed the
assembly generated by gcc though a Perl script known as the
mangler (see
ghc/driver/mangler/ghc-asm.lprl). The
mangler rearranges the assembly to support tail-calls and
various other optimisations.

In an unregisterised build, neither of these hacks are
used — the idea is that the C code generated by the
compiler should compile using gcc only. The lack of these
optimisations costs about a factor of two in performance, but
since unregisterised compilation is usually just a step on the
way to a full registerised port, we don't mind too much.

You should go through this process even if your
architecture is already has registerised support in GHC, but
your OS currently isn't supported. In this case you probably
won't need to port any of the architecture-specific parts of the
code, and you can proceed straight from the unregisterised build
to build a registerised compiler.

Notes on GHC portability in general: we've tried to stick
to writing portable code in most parts of the system, so it
should compile on any POSIXish system with gcc, but in our
experience most systems differ from the standards in one way or
another. Deal with any problems as they arise - if you get
stuck, ask the experts on
<glasgow-haskell-users@haskell.org>.

Lots of useful information about the innards of GHC is
available in the GHC
Commentary, which might be helpful if you run into some
code which needs tweaking for your system.

9.2.1. Cross-compiling to produce an unregisterised GHC

NOTE! These instructions apply to GHC 6.4 and (hopefully)
later. If you need instructions for an earlier version of GHC, try
to get hold of the version of this document that was current at the
time. It should be available from the appropriate download page on
the GHC homepage.

In this section, we explain how to bootstrap GHC on a
new platform, using unregisterised intermediate C files. We
haven't put a great deal of effort into automating this
process, for two reasons: it is done very rarely, and the
process usually requires human intervention to cope with minor
porting issues anyway.

The following step-by-step instructions should result in
a fully working, albeit unregisterised, GHC. Firstly, you
need a machine that already has a working GHC (we'll call this
the host machine), in order to
cross-compile the intermediate C files that we will use to
bootstrap the compiler on the target
machine.

On the target machine:

Unpack a source tree (preferably a released
version). We will call the path to the root of this
tree T.

$ cd T
$ ./configure --enable-hc-boot --enable-hc-boot-unregisterised

You might need to update
configure.in to recognise the new
platform, and re-generate
configure with
autoreconf.

change TARGETPLATFORM
appropriately, and set the variables involving
TARGET or
Target to the correct values for
the target platform. This step is necessary because
currently configure doesn't cope
with specifying different values for the
--host and
--target flags.

copy LeadingUnderscore
setting from target.

Copy
T/ghc/includes/ghcautoconf.h, T/ghc/includes/DerivedConstants.h, and T/ghc/includes/GHCConstants.h
to
H/ghc/includes.
Note that we are building on the host machine, using the
target machine's configuration files. This
is so that the intermediate C files generated here will
be suitable for compiling on the target system.

Touch the generated configuration files, just to make
sure they don't get replaced during the build:

Note: it has been reported that these files still get
overwritten during the next stage. We have installed a fix
for this in GHC 6.4.2, but if you are building a version
before that you need to watch out for these files getting
overwritte by the Makefile in
ghc/includes. If your system supports
it, you might be able to prevent it by making them
immutable:

At this stage we simply need to bootstrap a compiler
from the intermediate C files we generated above. The
process of bootstrapping from C files is automated by the
script in distrib/hc-build, and is
described in Section 9.1, “Booting/porting from C (.hc) files”.

$ ./distrib/hc-build --enable-hc-boot-unregisterised

However, since this is a bootstrap on a new machine,
the automated process might not run to completion the
first time. For that reason, you might want to treat the
hc-build script as a list of
instructions to follow, rather than as a fully automated
script. This way you'll be able to restart the process
part-way through if you need to fix anything on the
way.

Don't bother with running
make install in the newly
bootstrapped tree; just use the compiler in that tree to
build a fresh compiler from scratch, this time without
booting from C files. Before doing this, you might want
to check that the bootstrapped compiler is generating
working binaries:

Once you have the unregisterised compiler up and
running, you can use it to start a registerised port. The
following sections describe the various parts of the
system that will need architecture-specific tweaks in
order to get a registerised build going.

9.2.2. Porting the RTS

The following files need architecture-specific code for a
registerised build:

ghc/includes/MachRegs.h

Defines the STG-register to machine-register
mapping. You need to know your platform's C calling
convention, and which registers are generally available
for mapping to global register variables. There are
plenty of useful comments in this file.

Support for
foreign import "wrapper"
(aka
foreign export dynamic).
Not essential for getting GHC bootstrapped, so this file
can be deferred until later if necessary.

ghc/rts/StgCRun.c

The little assembly layer between the C world and
the Haskell world. See the comments and code for the
other architectures in this file for pointers.

ghc/rts/MBlock.h
, ghc/rts/MBlock.c

These files are really OS-specific rather than
architecture-specific. In MBlock.h
is specified the absolute location at which the RTS
should try to allocate memory on your platform (try to
find an area which doesn't conflict with code or dynamic
libraries). In Mblock.c you might
need to tweak the call to mmap() for
your OS.

9.2.3. The mangler

The mangler is an evil Perl-script
(ghc/driver/mangler/ghc-asm.lprl) that
rearranges the assembly code output from gcc to do two main
things:

Remove function prologues and epilogues, and all
movement of the C stack pointer. This is to support
tail-calls: every code block in Haskell code ends in an
explicit jump, so we don't want the C-stack overflowing
while we're jumping around between code blocks.

Move the info table for a
closure next to the entry code for that closure. In
unregisterised code, info tables contain a pointer to the
entry code, but in registerised compilation we arrange
that the info table is shoved right up against the entry
code, and addressed backwards from the entry code pointer
(this saves a word in the info table and an extra
indirection when jumping to the closure entry
code).

The mangler is abstracted to a certain extent over some
architecture-specific things such as the particular assembler
directives used to herald symbols. Take a look at the
definitions for other architectures and use these as a
starting point.

9.2.4. The splitter

The splitter is another evil Perl script
(ghc/driver/split/ghc-split.lprl). It
cooperates with the mangler to support object splitting.
Object splitting is what happens when the
-split-objs option is passed to GHC: the
object file is split into many smaller objects. This feature
is used when building libraries, so that a program statically
linked against the library will pull in less of the
library.

The splitter has some platform-specific stuff; take a
look and tweak it for your system.

9.2.5. The native code generator

The native code generator isn't essential to getting a
registerised build going, but it's a desirable thing to have
because it can cut compilation times in half. The native code
generator is described in some detail in the GHC
commentary.

9.2.6. GHCi

To support GHCi, you need to port the dynamic linker
($(GHC_TOP)/rts/Linker.c). The
linker currently supports the ELF and PEi386 object file
formats - if your platform uses one of these then things will
be significantly easier. The majority of Unix platforms use
the ELF format these days. Even so, there are some
machine-specific parts of the ELF linker: for example, the
code for resolving particular relocation types is
machine-specific, so some porting of this code to your
architecture and/or OS will probaly be necessary.

If your system uses a different object file format, then
you have to write a linker — good luck!