1 Introduction

The Foreign Function Interface (FFI) allows Haskell programs to cooperate with programs written with other languages. Haskell programs can call foreign functions and foreign functions can call Haskell code.

Compared to many other languages, Haskell FFI is very easy to use: in the most common case, you only have to translate the prototype of the foreign function into the equivalent Haskell prototype and you're done. For instance, to call the exponential function ("exp") of the libc, you only have to translate its prototype:

doubleexp(double);

into the following Haskell code

foreign import ccall "exp" c_exp ::Double->Double

Now you can use the function "c_exp" just like any other Haskell function. When evaluated, it will call "exp".

Similarly, to export the following Haskell function:

triple ::Int->Int
triple x =3*x

so that it can be used in foreign codes, you only have to write:

foreign export ccall triple ::Int->Int

It can get at little more complicated depending on what you want to do, the function parameters, the foreign code you target, etc. This page is here to explain all of this to you.

2 Generalities

2.1 FFI extension

The Foreign Function Interface (FFI) is an extension to the Haskell standard. To use it, you need to enable it with the following compiler pragma at the beginning of your source file:

{-# LANGUAGE ForeignFunctionInterface #-}

2.2 Calling conventions

When a program (in any language) is compiled into machine code, functions and procedures become labels: a label is a symbol (a string) associated to a position into the machine code. Calling a function only consists in putting parameters at appropriate places into memory and registers and then branching at the label position. The caller needs to know where to store parameters and the callee needs to know where to retrieve parameters from: there is a calling convention.

To interact with foreign code, you need to know the calling conventions that are used by the other language implementation on the given architecture. It can also depend on the operating system.

GHC supports standard calling conventions with the FFI: it can generate code to convert between its internal (non-standard) convention and the foreign one. If we consider the previous example:

foreign import ccall "exp" c_exp ::Double->Double

we see that the C calling convention ("ccall") is used. GHC will generate code to put (and to retrieve) parameters into memory and registers conforming to what is expected by a code generated with a C compiler (or any other compiler conforming to this convention).

Other available conventions supported by GHC include "stdcall" (i.e. Pascal convention).

2.3 Foreign types

Calling conventions depend on parameter types. For instance, floating-point values (Double, Float) may be passed into floating-point registers. Several values can be combined into a single vector register. And so on. As an example, in http://www.x86-64.org/documentation/abi.pdf you can find the algorithm describing how to pass parameters to functions on Linux on a x86-64 architecture depending on the types of the parameters.

Only some Haskell types can be directly used as parameters for foreign functions, because they correspond to basic types of low-level languages such as C and are used to define calling conventions.

According to [1], the type of a foreign function is a foreign type, that is a function type with zero or more arguments where:

3 Function pointers

Sometimes you want to manipulate foreign pointers to foreign functions: these are FunPtr in Haskell.

You can get a function pointer by using "&" before a foreign function symbol:

foreign import ccall "&exp" a_exp :: FunPtr (Double->Double)

Some foreign functions can also return function pointers.

To call a function pointer from Haskell, GHC needs to convert between its own calling convention and the one expected by the foreign code. To create a function doing this conversion, you must use "dynamic" wrappers:

Then you can apply this wrapper to a FunPtr to get a Haskell function:

c_exp ::Double->Double
c_exp = mkFun a_exp

You can also perform the opposite operation to give to the foreign code a pointer to a Haskell function. You need a "wrapper" wrapper. GHC generates the callable code to execute the wrapped Haskell closure with the appropriate calling convention and returns a pointer (FunPtr) on it. You have to release the generated code explicitly with `freeHaskellFunPtr` to avoid memory leaks: GHC has no way to know if the function pointer is still referenced in some foreign code, hence it doesn't collect it.

4 Marshalling data

In Haskell we are accustomed to let the runtime system -- especially the garbage collector -- manage memory. When we use the FFI, however, we sometimes need to do some manual memory management to comply with the data representations of the foreign codes. Hopefully, Haskell makes it very easy to manipulate low-level objects such as pointers. Moreover, many useful Haskell tools have been designed to simplify conversions between data representations.

4.1 Pointers

A pointer is an offset in memory. In Haskell, it is represented with the Ptr a data type. Where "a" is a phantom type that can be used to differentiate two pointers. You can think of "Ptr Stuff" as being equivalent to a "Stuff *" type in C (i.e. a pointer to a "Stuff" data). This analogy may not hold if "a" is a Haskell type not representable in the foreign language. For instance, you can have a pointer with the type "Ptr (Stuff -> OtherStuff)" but it is not function pointer in the foreign language: it is just a pointer tagged with the "Stuff -> OtherStuff" type.

You can easily cast between pointer types using `castPtr` or perform pointer arithmetic using `plusPtr`, `minusPtr` and `alignPtr`. NULL pointer is represented with `nullPtr`.

4.2 Memory allocation

The allocation is ephemeral: it lasts the time of the execution of an IO action, as in the following example:

do
allocaBytes 128$\ptr ->do-- do stuff with the pointer ptr...-- ...-- do not return "ptr" in any way because it will become an invalid pointer-- here the 128 bytes have been released and should not be accessed

on the "low-level" heap (the same as the runtime system uses), using `malloc*` functions in Foreign.Marshal.Alloc

Allocations on the low-level heap are not managed by the Haskell implementation and must be freed explicitly with `free`.

do
ptr <- mallocBytes 128-- do stuff with the pointer ptr...-- ...
free ptr
-- here the 128 bytes have been released and should not be accessed

4.3 Foreign Pointers

An hybrid approach is to use ForeignPtr. Foreign pointers are similar to Ptr except that they have finalizers (i.e. actions) attached to them. When the garbage collector detects that a ForeignPtr is no longer accessible, it executes its associated finalizers. A basic finalizer is `finalizerFree` [2] that calls `free` on the pointer.

You can convert a Ptr into a ForeignPtr using `newForeignPtr`, add additional finalizers, etc. [3].

In the following example, we use `mallocForeignPtrBytes`. It is equivalent to call `malloc` and then to associate the `finalizerFree` finalizer with `newForeignPtr`. GHC has optimized implementations for `mallocForeignPtr*` functions, hence they should be preferred.

do
ptr <- mallocForeignPtrBytes 128-- do stuff with the pointer ptr...-- ...---- ptr is freed when it is collected

4.4 Using pointers: Storable instances

You often want to read or to write at the address a of pointer. Reading consists in obtaining a Haskell value from a pointer; writing consists in somehow writing a representation of the Haskell value at the pointed address. Writing and reading a value depends on the type of the value, hence these methods are encapsulated into the Storable type class.

For any type T such that it exists a Storable T instance:

you can read a value, using

peek :: Ptr T ->IO T

you can write a value, using

poke :: Ptr T -> T ->IO()

`Storable a` also defines a `sizeOf :: a -> Int` method that returns the size of the stored value in bytes.

All the marshallable foreign types (i.e. basic types) have Storable instances. Hence we can use these to write new Storable instances for more involved data types. In the following example, we create a Storable instance for a Complex data type:

4.5 Arrays

It is very common to read and to write arrays of values. Foreign.Marshal.Array provides many functions to deal with pointers to arrays. You can easily write an Haskell list of storable values as an array of values, and vice versa.

4.6 Strings

Strings in Haskell are lists of Char, where Char represents a unicode character. Many foreign codes use the C representation for strings (CString in Haskell): an array of bytes where each byte is a extended ASCII character terminated with a NUL character.

In Foreign.C.String, you have many functions to convert between both representations. Be careful because Unicode characters outside of the ASCII range may not be representable with the C representation.

4.7 Data structures

Marshalling data structures of foreign languages is the most cumbersome task: you have to find out the offset of each field of the data structure (considering padding bytes, etc.). Hopefully, there are Haskell tools to help with this task.

Suppose you have a C data structure like this:

struct MyStruct {
double d;
char c;
int32_t i;
};

And its Haskell counterpart:

data MyStruct = MyStruct
{ d ::Double
, c :: Word8
, i :: Int32
}

The following sub-sections present the different ways to write the Storable instance for MyStruct.

The structure alignment is the least common multiple of the alignments of the structure fields. The alignment of primitive types is equal to their size in bytes (e.g. 8 for Double, 1 for Word8 and 4 for Word32). Hence the alignment for MyStruct is 8.

We indicate the offset of each field explicitly for peek and poke methods. We introduce padding bytes to align the "i" field (Word32) on 4 bytes. A C compiler does the same thing (except for packed structures).

The size of the structure is the total number of bytes, including padding bytes between fields.

4.7.2 hsc2hc

hsc2hs is a tool that can help you compute field offsets by using C headers directly.

Save your Haskell file with a .hsc extension to enable the support of hsc2hs.

The CStorable type-class is equivalent to the Storable type-class but has additional default implementations for its methods if the type has an instance of Generic.

4.8 Pointers to Haskell data

In some cases, you may want to give to the foreign code an opaque reference to a Haskell value that you will retrieve later on. You need to be sure that the value is not collected between the time you give it and the time you retrieve it. Stable pointers have been created exactly to do this. You can wrap a value into a StablePtr and give it to the foreign code (StablePtr is one of the marshallable foreign types).

You need to manually free stable pointers using `freeStablePtr` when they are not required anymore.

5 Tools

There are several tools to help writing bindings using the FFI. In particular by using C headers.

6.1 Dynamic linker template

The idea is that a library is like a record containing functions, hence it is easy to generate the code that load symbols from a library and store them into a Haskell record.

In the following code, the record matching library symbols is the data type MyLib. The generated code will apply "myModifier" to each field name of the record to find corresponding symbols in the library. myModifier should often be "id" but it is sometimes useful when symbols are not pretty. Here in the foreign code "_v2" is appended at the end of each symbol to avoid symbol clashes with the first version of the library.

The package supports optional symbols: functions that may or may not be present in the library. These optional functions are represented by encapsulating the function type into Maybe.

The `libHandle` field is mandatory and contains a pointer to the loaded library. You can use it to unload the library.

A function called `loadMyLib` is generated to load symbols from a library, wrap them using "dynamic" wrappers and store them into a MyLib value that is returned.

7 Enhancing performance and advanced topics

To enhance performance of a call to a foreign function, you first need to understand how GHC runtime system works. GHC uses user-space threads. It uses a set of system threads (called "Tasks"). Each system thread can execute a "capability" (i.e. a user-space thread manager). User-space threads are distributed on capabilities. Each capability executes its associated user-space threads, one at a time, using cooperative scheduling or preemption if necessary.

All the capabilities have to synchronize to perform garbage collection.

When a FFI call is made:

the user-space thread is suspended (indicating it is waiting for the result of a foreign call)

the current system thread executing the capability executing the user-space thread releases the capability

the capability can be picked up by another system thread

the user-space threads that are not suspended in the capability can be executed

garbage collection can occur

the system thread executes the FFI call

when the FFI call returns, the user-space thread is woken up

If there are too many blocked system threads, the runtime system can spawn new ones.

7.1 Unsafe calls

All the capability management before and after a FFI call adds some overhead. It is possible to avoid it in some cases by adding the "unsafe" keyword as in the following example:

foreign import ccall unsafe "exp" c_exp ::Double->Double

By doing this, the foreign code will be directly called but the capability won't be released by the system thread during the call. Here are the drawbacks of this approach:

if the foreign function blocks indefinitely, the other user-space threads of the capability won't be executed anymore (deadlock)

if the foreign code calls back into the Haskell code, a deadlock may occur

it may wait for a value produced by one of the locked user-space threads on the capability

there may not be enough capabilities to execute the code

7.2 Foreign PrimOps

If unsafe foreign calls are not fast enough for you, you can try the GHCForeignImportPrim extension.

7.3 Bound threads

Some foreign codes use (system) thread-local storage. Some others are not thread-safe. In both case, you have to be sure that the same system thread executes the FFI calls. To control how user-space threads are scheduled on system threads, GHC provide bound threads. Bound threads are user-space threads (Haskell threads) that can only be executed by a single system thread.

Note that bound threads are more expensive to schedule than normal threads. The first thread executing "main" is a bound thread.

7.4 Inline FFI calls

If you want to make a one-shot FFI call without the hassle of writing the foreign import, you can use the following technique (using Template Haskell).

"C functions are normally declared using prototypes in a C header file. Earlier versions of GHC (6.8.3 and earlier) #included the header file in the C source file generated from the Haskell code, and the C compiler could therefore check that the C function being called via the FFI was being called at the right type.

GHC no longer includes external header files when compiling via C, so this checking is not performed. The change was made for compatibility with the native code backend (-fasm) and to comply strictly with the FFI specification, which requires that FFI calls are not subject to macro expansion and other CPP conversions that may be applied when using C header files. This approach also simplifies the inlining of foreign calls across module and package boundaries: there's no need for the header file to be available when compiling an inlined version of a foreign call, so the compiler is free to inline foreign calls in any context.

The -#include option is now deprecated, and the include-files field in a Cabal package specification is ignored."

10 TODO

Fix References section

Foreign language specific issues

C++ symbol mangling

Embedded Objective C

Precision

The Haskell report only guarantees that Int has 30 bits of signed precision, so converting CInt to Int is not safe! On the other hand, many classes have instances for Int and Integer but not CInt, so it's generally more convenient to convert from the C types. To convert, I suppose you could either write a 'checkedFromIntegral' function if you're sure it's small or just use Integer.