Lesson 5: Modules and the C interface

Modules

With very few exceptions, our discussion of OCaml so far only
covered the "core language", as well as some of the standard
libraries. We can by now write and even compile small OCaml scripts,
but we cannot yet build our own libraries. It is now time to change
this. In this lesson, we will not always be as detailed with our
explanations as in the previous ones, but rather take the pragmatic
route and see that we get going - even though this perhaps will
inevitably mean that we will occasionally have to deal with exotic
situations which we cannot fully understand using only the material
presented here. On the other hand, some very basic and general things
have to be covered in great detail, as it is essential to develop an
understanding for the underlying machinery.

We first have to take a closer look at the compiler. One should
keep in mind that actually, OCaml is not one language, but two, which,
however, are syntactically almost the same (up to toplevel directives,
that is). For many compiler languages, the process of turning an idea
into working code will utilize files of many different types, some of
them source files, some of them intermediate files. This is just as
true for a C compiler (.c, .h,
.o, .so, .s) as it is for, say,
TeX (.tex, .dvi, .aux,
.log), and also holds for OCaml. For some systems, file
endings are just a convention, while others enforce certain
names. OCaml is of the latter type: a file named
something.ml will - for example - always be treated as an
OCaml source file. What other file types does the compiler know about?

OCaml file types

Type (machine code equivalent given in parens if different)

Meaning

.ml

OCaml source file

.mli

OCaml interface file

.cmi

Compiled interface

.cmo (.cmxo)

Compiled object

.cma (.cmxa)

Compiled object archive (library)

.c

C source code

.o

C object code

.so

C shared object library

It is nice that the OCaml compiler knows about C source files as
well and will call the C compiler when necessary. One detail worth to
know is that the manpage of the ocamlc(3) compiler is
incomplete. One may for example wish to explicitly specify which C
compiler to use (say, gcc or intel's icc, or
use a C compiler wrapper such as mpicc). This is
possible, and ocamlc --help as well as the OCaml online
documentation tell us that we can use the -cc option for
this, but this is not mentioned in the manpage. Also note that the
order of objects given to the compiler does matter.

What is this .mli interface thing about? A
.ml file provides a compilation unit. We may now choose
not to export all the definitions in that compilation unit to the
outside world, or make some type definitions opaque (that is, we just
tell the outside about the existence of a given type, but not its
complete realization). This is what the .mli file is
for. Furthermore, we may want to put extra documentation into that
file - which is provided in the form of comments that adhere to
certain conventions. ocamldoc will then allow us to
automatically generate HTML and latex documentation for our module. A
.mli file is more or less just the list of variable types
and type definitions which the toplevel prints out if we load a
.ml file. It can be auto-generated from a
.ml file (for later editing) via ocamlc -i
code.ml. The fine print says that we acually often can also go
without such an interface file, but as a matter of good practice, we
should always provide one.

The behaviour of OCaml with respect to re-compilation of modules
often is somewhat picky. It will include cryptographic hash
fingerprints in compiled interface .cmi definitions and a
few other places, and as a consequence, if library/module B uses some
independent module A, which undergoes a change (and if this is even
only the addition of one more function), then B will complain that its
idea of the interface of A no longer matches reality, that is, B has
to be recompiled because A was. Such behaviour is somewhat unexpected
in particular to C programmers, and there have been long discussions
whether this makes sense and is a good thing or not. As one may guess,
it does not especially make life easy for module maintainers and in a
sense, Ocaml tries to be "holier than CVS version control" here.

Due to very similar issues (especially with component
dependencies), OCaml may feel quite a bit unnatural for seasoned C
developers when it comes to writing Makefiles. Indeed, many newcomers
seem to experience major difficulties here. Hence, one normally is
much better off using a pre-existing tool that deals with most of this
makefile complexity: OCamlMakefile. This is a Makefile
that is to be included in our own Makefiles and provides quite a lot
of intelligence that does most of the really dirty work. In Debian,
it's part of the ocaml-tools
package. OCamlMakefile cooperates quite well with
ocamlfind, which is a package and dependency management
system for OCaml. (Objective Caml provides a simple library location
and loading framework right out of the box, as can be seen by giving a
directive like #load "unix.cma" to the toplevel, but
findlib is much more flexible.) The
ocamlfind Debian package is called
ocaml-findlib.

When installing ocamlfind, one may want to make a few
adjustments, especially if multiple users in the Unix group
ocaml are supposed to be able to install libraries
system-wide. Oh the author's system, this looks as follows:

Further adjustments may have to be made to the file
/usr/lib/ocaml/3.08.3/ld.conf (This is unfortunate and
should not be, as configuration files should always reside under
/etc in Debian):

/usr/lib/ocaml/3.08.3/ld.conf

/usr/local/lib/ocaml/stublibs
/usr/lib/ocaml/3.08.3/stublibs

The /usr/local/lib/ocaml structure then was re-built
in such a way that the stublibs directory is a direct
subdirectory of this. /usr/local/lib/ocaml is owned by
user root, group ocaml, and has mode 2775. Packages are installed as
direct subdirectories.

The structure of a simple module

In the project we are working on right now at the University of
Southampton, there is a catch-it-all module which collects small
useful snippets that are not present in the OCaml standard library and
do not justify creating an individual new module either. In this
module, one can find functions for degree to radian conversion just as
well as a function to generate a random number with gaussian
distribution. The directory looks like this:

Note that by default, the all: target will build both
a bytecode and native-code library. Other interesting dependencies we
may want to include here are "doc" (auto-generate
documentation) and "top" (build a toplevel) - maybe even
"native-code" to build a fast compiled standalone
executable. These are the most common ones; see the OcamlMakeFile
documentation for information on what else there is.

Note that we furthermore define a "mrproper" symbolic
target for complete cleanup. This is nice and convenient. The name, by
the way, was taken from the Linux kernel source makefiles. Now, if
this builds correctly, we can use the power of
OCamlMakefile and findlib to install it with
a simple make libinstall and remove it again with a
make libuninstall. If some other package now depended on
snippets, we would add snippets to the
PACKS= line in the Makefile, as well as to the
requires line of the META file, which may
then e.g. look like this: requires = "snippets qhull
mt19937". The META file is used by findlib.

This more or less tells us how to build and install simple OCaml
modules. "Simple" in the sense that we do not use sophisticated
foreign language interface techniques, but just basic plain OCaml.

If we want to use such an installed package from the toplevel, the
magical incantations are:

Loading findlib packages from the toplevel - example

#use "topfind";;
#require "snippets";;
open Snippets;;

...where the open just imports all the symbols into our
namespace so that we can refer to them directly instead of having to
use names such as Snippets.deg_rad. Note that the
compiler does not understand toplevel directives. So, whenever we have
the situation that some piece of OCaml code (a small script, say) is
to be fed both into the toplevel and the compiler, it makes sense to
add a small toplevel loader wrapper script which just contains the
package loading directives, plus a final #use "mycode.ml";;
directive which then loads the "interesting" code.

The C Interface

(Things you never ever wanted to know, but were forced to find out)

Even if we had the most elegant, most efficient, most effective
language available, its value would be reduced greatly if it did not
come with an interface to C. The very simple reason for this is that
nowadays, a lot of important functionality is available in the form of
C libraries - especially if speed matters for the task. Sooner or
later, we want to tap that resource, and hence, every language must
provide a C interface, just as every serious programmer should be
somewhat proficient with C (even if he does not use it often).

Some very general remarks

Concerning foreign language interfaces in general, here are
different levels of sophistication, and the answer to the question
what can be achieved depends just as much on the ability of the
language as well as on that of the programmers of both the interface
code and the code to be interfaced. In fact, many things can go wrong,
and should insurmountable problems arise, they are practically always
a consequence of bad design. So, it pays to spend quite some time
thinking about foreign language issues, no matter if one takes on the
role of library implementor, language designer, or interface code
writer.

One very important point to keep in mind is that it is very easy to
build large and complex systems by combining components which were
never intended to interoperate well. As a rule of thumb, the amount of
internal interface friction in a software project with N components
usually is proportional to N^a, with the exponent a being closer to 2
than to 1. So, the most problematic moments in the development of a
N-component application are whenever there is a version update/change
of at least one component. From this perspective, especially the
philosophy underlying the Debian GNU/Linux system to provide a stable
platform as well as a large library of components whose behaviour is
and stays frozen for long times (up to important bugfixes) comes as a
blessing, and presumably such a concept of behavioural (version)
stability and reliability should find wider recognition as a vital
quality factor.

Rule: Keep it simple - do not overdo it!

Another important general observation: designing the interface to a
foreign language library is a task that often needs a lot of thought
and sometimes a certain amount of experience. One of the major
problems is that the fundamental philosophy underlying the two
languages which are to be bridged is different. (If it were not, at
least one of them would be completely unnecessary.) So, the big
question is how to catch the spirit - the key ideas - underlying the
piece of code to be interfaced and map this in the best possible way
to something that feels smooth from the perspective of the new
language. There is no patent recipe to that question, but there are a
few common observations one should know about. First of all, it is
often a good idea to keep a foreign language interface as direct and
as low level as possible. While it may be tempting to put more
intelligence into the interface and employ the power of the new
language to make things more convenient, this is a double-edged sword,
as it may easily lead to a violation of the principle of least
surprise. In particular if the library in question is well known
and frequently used in its natural environment, one must assume that
many users of the interface will have expectations that were shaped by
the original behaviour. In fact, the author of this lesson still
remembers quite well the shock of finding out that there is a subtle
difference in the behaviour of Perl's built-in fork() and
Unix fork(2): In case of a fork failure, the latter
returns -1 (and sets errno appropriately),
while the former returns undef! That may certainly have
been well meant, but such unexpected surprises may have disastrous
consequences. Another aspect is: the simpler the interface, the less
effort to adjust it to new versions.

If, however, the target language gives strong safety guarantees
(type safety, bounds checks, crash stability and such), one must
assume that the user of the interfaced library will expect those
safety belts to also work with that particular library. The same holds
for automatic dynamical resource management, such as garbage
collection. So, one usually would like to provide at least these
features in a manually written interface - but again, depending on the
situation, there may be exceptions (e.g. if this turns out to be
prohibitively complicated, or if it is important to stay at the lowest
level, or if it greatly simplifies things, or just if it's too small a
problem to be worth the effort).

Rule: Let the machine do it whenever possible

Writing interface code manually often is a tedious task with many
repetitive steps. True, there are situations where one should use as
much intelligence and wisdom as possible, but there also are
situations where one has to interface dozens of functions in an
uniform way. Then, it is often a good idea not to write all that code
by hand, but use code that automatically generates that part of the
interface code.

Nowadays, there are tools such as the Simplified Wrapper and
Interface Generator (SWIG) that
can help a lot.

Rule: Know when not to do it

Just because something can be done in principle, it need not be a
good idea. If something looks challenging, this may be for a variety
of reasons, one of them being that the author whose work we decided to
build on had chosen an excessively unelegant and clumsy approach,
maybe because he did not fully understand the nature of the
problem. (This is not necessarily a fault of the author. If we only
did things we understand well all the time, there would be virtually
no progress at all! Hence, we desperately need brave souls that tackle
problems which nobody understands properly, and many a breakthrough
was achieved only after a lot of confusion.) Nevertheless, one should
always keep the question in mind: does it really have to be that
way? Can't we achieve the same or a better result with a more
elegant approach, maybe using some other piece of code?

In particular, whenever you have the impression that you have to
fight against the original author, as his intentions do not at all
match yours, and his code evolves in a different direction than the
one you are interested in, better look for an alternative.

Desirable features of a C interface

As mentioned above, there are different levels of sophistication in
the art of calling foreign functions - from just executing a C
function to print a value to stderr to writing code that
allows one to turn C callback functions into callback functions in the
higher-level language. Other exotic applications may include using the
C compiler at run time to map C code strings into dynamically loaded
shared objects, or telling a C library to use the dynamical memory
management of the higher-level language instead of
malloc(3)/free(3) in order to put its dynamical state into
serializable strings (cf. PCRE).

Quite in general, one may want a good foreign function interface to
provide the following capabilities:

It is possible to call compiled C code from the language.

It is possible to runtime-link and call C shared object libraries.

It is possible to register callbacks from C into the language.

Calling into C and from there back into the language recursively
"just works as it should". (There are some call stack issues here.)

It is possible to start the high-level language run time system
from within an own C main() function.

It is possible to put code written in the high-level language into
libraries that are callable as C shared object libraries.

...and this works with any C compiler on your machine (including
in particular compiler wrappers like MPI's mpicc).

...and it is reasonable and easy to have multiple independent
C-callable .so libraries (that are to be used in
conjunction) utilize code written in the extension language at the
same time.

One can start the high-level language's
read-eval-print-loop/command-line-interface from within C.

Fortunately, OCaml provides us with a quite powerful C interface
that allows us to do most of the things mentioned above - even if this
sometimes seems to be just by accident. (For example, it is possible
to turn OCaml code into a C-linkable .so shared library
that almost behaves like any other C library, which is great, but this
does not work for 64-bit x86 code.) One problem, however, is that at
present, the documentation is somewhat scattered, and there are
important things to be known which are not well documented at all.

OCaml <=>C: first examples

After these general remarks, it is perhaps appropriate to look at
something more practical and explain the details by means of a few
typical examples. What one should know about the C interface is:

It is documented in Part III (Chapter 18) of the OCaml on-line
documentation.

There is a quite close integration of OCaml and C compilers: the
OCaml compiler will recognize .c source file arguments
and know what to do with them. (See above.)

Basically, we have to provide C code that can take OCaml values as
argumens and return OCaml values. So, it us up to us (or some foreign
function interface generator) to provide the wrapper code. This is
quite unlike, say, the C interface of CMU Common Lisp, where one can
use alien-funcall and extern-alien to
directly call code from a C library (i.e. all the wrapping information
is provided in Lisp, not C).

As a consequence of the previous point, we have to be make sure
that whenever we allocate new composite OCaml values (such as tuples)
from within C, we live in harmony with the garbage collector.

If we do something wrong, this may easily lead to heap corruption,
i.e. mess up the internal memory management data structures of
OCaml. In most situations, this will mean that doing an explicit call
to the OCaml garbage collector via "Gc.full_major ();;"
will result in a program segfault. ("Garbage collection" customarily
is abbreviated as "GC".)

But let us have a look at the perhaps simplest example of a C
function exported to OCaml. We will wrap this up in a dedicated
package, which we call "c_examples". We start out by
creating a corresponding directory, into which we put the following
files:

Now that we have seen that this actually works, let us look in some
more detail at what is going on here. Clearly, the .mli
file just specifies what to export - as we only provide one interfaced
function anyway, this is very straightforward. In the .ml
file, we declare our function as external,
i.e. implemented by a piece of code adhering to C linking conventions,
whose linker name we give as well. The implementation of that function
takes as argument an OCaml value, and has to return an OCaml
value. Here, this is supposed to encode an integer, and we need
conversion functions to map OCaml values to C values and back. For
int, this is pretty straightforward, but we have to keep
in mind that the OCaml integer range is strictly smaller than the C
integer range!

Note the use of CAMLparamX() and
CAMLreturn macros to declare and handle entities of type
value that represent OCaml values. These are necessary to
live in harmony with the garbage collector. Indeed, there are more of
this type, the next most important ones being the
CAMLlocalX() macros. More about this later.

We will now proceed to extend our example with further definitions
that demonstrate a few basic techniques. First, let us see how to wrap
up higher order functions, how to pass floatingpoint numbers, and how
to add primitive debugging facilities to the C code: we add the
following definitions and then rebuild:

The process of wrapping up and unwrappng values from one language
for another language sometimes is called marshalling, but
nowadays this more often refers to "serialization", that is, mapping a
piece of data with potentially complex structure to a string to store
it and retrieve it later on. (Incidentally, OCaml provides a
Marshal library which is all about serialzation.) As we
see, the names of the functions and macros we use to do the mapping is
somewhat non-uniform, but so is their internal mechanics:
Int_val is not much more than a very simple bit-shifting
macro, while copy_double will have to heap-allocate space
to hold a double-float value.

One should know that higher order functions can only be wrapped in
this way if they have up to five arguments. (Usually, this is enough.)
Other techniques have to be used for functions with more arguments.

Wrapping functions from a library

Let us try something slightly more challenging next: we want to
wrap some functions from a library other than libc or
libm and pass around strings. Let us use the low level
X11 library xlib here. In particular, we want to be able
to open and close a connection to an X display and obtain the X server
vendor identification string. We hence add the following to our example:

This is already quite nice, but it opens up new questions. If we
"lose" an Xlib pointer, it will be garbage collected, but the
connection stays open. We might instead prefer to have that particular
case handled in such a way that an incidentally forgotten active Xlib
connection that is garbage collected will be closed
automatically. Furthermore, might even want to have Xlib functions
that are called on an inactive/invalid display raise an exception. All
this can indeed be implemented, and will be our next major
example. But before we consider this, let us make an excursion that
explains some more of the background mechanics underlying the low
level implementation and inparticular the C interfaces of many
functional languages.

Some background on functional language implementations

If look under the hood of all the fancy syntax and ignore code
generator issues for now, the relevant questions at the lowest level
are: how are the fundamental data types implemented and mapped to
machine data types, and what conventions are in place that have to be
respected? One important component in this game is the Garbage
Collector, which will from time to time scan the heap (= all the
memory managed by the language where values can reside) and recycle
pieces of data that have become un-reachable and hence ballast.

What type information has to be available at run time? At the very
least, the system has to be able to find out whether a certain OCaml
value, stored in a given region of memory, contains references to
other OCaml values or not. The Garbage Collector has to know this so
that it can scan all the memory that has been allocated in our running
program for "live" objects and declare all other data as "dead", that
is, unreachable. This evidently means that the memory representation
of an OCaml array (or tuple, say), which may reference (i.e. contain
pointers to) other OCaml values, must contain information about the
length of the array (or tuple). One could imagine that from the
perspective of the garbage collector, the world of hierarchically
constructed types is much simpler, and that indeed, arrays and tuples
might even have precisely the same representation in memory: Both
represent vectors of OCaml values, and even if they behave very
differently from the programmer's perspective, there is no reason why
they should not be just the same internally: after all, the question
what one can do with these data is resolved entirely at compile time.

So, we may imagine an internal data representation scheme where all
constant-time addressable vectors (tuples, arrays) appear as a region
of memory that contains a single header word (or at most a few words)
that provides length information, followed by pointers to the
contents. This actually would be quite similar to the way how data are
represented internally in the GCL (Gnu Common Lisp) system (see object.h
in the GCL sources, especially the definition of "union
lispunion"), only that the structure of the header is a little
bit more complicated, and we retain enough information to derive the
actual concrete type at run time - which we have to, as LISP is
dynamically typed. Non-compiler scripting languages like Perl or
Python, which also are dynamically typed, use similar approaches, but
typically are way more verbose in their internal value data structures
(see e.g. The
corresponding definition of typedef struct _object (...)
PyObject and the corresponding comments in the Python
sources), and frequently include in particular a reference count, as
they usually do not have a proper garbage collection (which, by the
way, is a shame, given the existence of the very powerful Boehm-Demers-Weiser
garbage collector library).

Suppose we stick with such a scheme where every value is
represented by a pointer to a piece of memory that holds all the
data. Whenever we pass even the smallest piece of data - like an
ordinary machine integer number - into a function, the system first
has to do dynamic memory allocation to obtain space where to put the
number, adorn it with some header that says, basically, "there is only
this single one word of data, and it is not a pointer to further
values", and then pass a pointer to that piece of memory. The
recipient will then have to look up the number through that
pointer. Now, this "boxing" and "unboxing" is quite a lot of time
consuming overhead, as it is ubiquituous and hence has to be done over
and over again. Therefore, it is evidently desirable to have a
compiler that is intelligent enough to avoid unnecessary boxing (maybe
via inlining) for purely internal functions that are not visible to
the outside. However, when calling a function from an independent
binary-code library, we presumably will have to go through this boxing
and unboxing.

Imagine creating something as simple as an array of one million
integers. If OCaml used the scheme suggested right above, we would
require two data words (32 bit on 32 bit machines) to represent every
integer, and have an array of pointers, so we would need three words
of memory to encode a single word of data! Clearly, this is a highly
unsatisfactory situation. (Indeed, this is just exactly what happens
with GCL: see!) Can this be avoided?
Actually, one might think so, as we have all the type information available
at compile time that allows us to discern what's a pointer to a value
and what's just raw data. But consider the following example:

(Incidentally, this is also a nice example that shows that the
complexity of the type of an expression can grow at least
exponentially with the size of the expression.) What code would the
compiler have to generate so that the garbage collector can know which
entries of all the tuples in this example hold raw data, and which
hold pointers to tuple values? If you think about it long enough, you
will come to the conclusion that actually, we require one bit of
information for every tuple slot. We might conceive collecting these
in a bit-vector which we place right after the tuple header word. This
may indeed be possible, but would make the garbage collector somewhat
clumsy. The approach usually taken instead makes use of the
observation that pointers to values are aligned to divisible-by-four
addresses. That is, the two least significant bits of these pointers
are unused, and always zero. Suppose now we implement the following
scheme: value references will not take the form of ordinary memory
pointers to the address where the referenced value lies, but instead
be pointers to that address plus one. When we want to use this as a
pointer, we use CPU instructions with fixed-offset addressing that
cancel this off-by-one. (This is not a problem for CISC CPUs, which
have such addressing modes in their assembly language opcode set, and
also not a problem for superscalar RISC CPUs, which just have to do one
more offset calculation on one of their integer units - and actually,
speeding up offset calculations is just one of the major reasons why
they do have more than one integer unit (and usually just one memory
access unit) in parallel.) We now declare that every tuple entry whose
least significant bit is a "one" is such a special pointer, and
everything that has a zero as its least significant bit is an
"immediate value", that is, the word itself carries all the data.

In particular, we may encode true and
false as the binary values 0b00 and
0b10. The integer N we encode as
N*2. Addition and subtraction will still work as usual,
but when we multiply or divide, we have to do one additional
bit-shifting operation (which usually is quite little effort in
comparison to the multiplication). This means that we will not be able
to discern the memory representations of, say, 1 and
true, but this does not matter, as it is of no relevance
to the garbage collector, and all conflicts that may happen have been
prevented by the compile-time type checking. Likewise, we can encode
single characters as immediate values. Functions such as
Char.code then may be just eliminated by the compiler.

Such a pointer tagging scheme is what most functional compiler
systems use nowadays. There are, however, differences in the tagging
schemes implemented. CMUCL/SBCL for example align all memory cells to
8-byte boundaries and use the three least significant bits as type
tags to discern cons cells, characters, structures, arrays, etc. See
object.tex
in the CVS sources. OCaml chooses to use a least significant bit of 1
to denote integers, which is very unusual. Other systems may implement
other slight variations on the general subject, such as using high
bits instead of low bits. An interesting but very useful curiosity
that works without any extra pointer tag bits is the
Boehm-Demers-Weiser conservative Garbage Collector for C, which comes
as a drop-in malloc() replacement - indeed this is so efficient that
some quite reasonable functional languages (Bigloo Scheme, for
example) decided not to implement their own GC, but rely on this
library instead. How can this possibly work? Basically: if something
looks like a pointer, we just assume it could be a pointer and scan
the corresponding region.

Now, one might say that if a CPU were especially designed to support
functional languages, it should provide extra type tag bits for every
value. With modern 64-bit CPUs, there usually is little need for fast
full 64-bit integers, so we may well afford providing only 62-bit
arithmetics, and pointers will not use the full 64-bit address range
anyway due to MMU limitations. (Typically, a page will consist of 1024
8-byte entries, hence use up 13 address bits. The usual three-level
MMUs then only can use 10*3+13=43 address bits. Seen that way, going
to 48 instead of 64 bits may have been more reasonable.) What is
slightly special about OCaml is that its implementors have
deliberately chosen to use internal representations that do not allow
one to re-derive enough type information to print the value in a
meaningful way. Internally, there is no distinction between
"false" and "0", say. This is somewhat
unfortunate, as it means that there is no way to implement an ad-hoc
polymorphic debug-printing function of type 'a ->
string that just prints out some OCaml value in a meaningful
way - similar to Perl's Data::Dumper.

Even though we might not be meant to know what is going on in this
file, it is nevertheless worthwile to have a look at file:///usr/lib/ocaml/3.08.3/caml/mlvalues.h
to see how some of the low-level definitions work. Note that we
are not supposed to rely on that particular realization, as this may
change in the future!

What are the ML-specific macros such as CAMLprim,
CAMLparam1, CAMLlocal1,
CAMLreturn for? Roughly speaking, CAMLprim
has to do with exporting our functions properly for OCaml. The
CAMLparam/CAMLlocal macros are required for garbage
collection. We can imagine situations where some allocated piece of
memory "almost becomes garbage" in the sense that all references to it
are lost, except some that are passed into C. As a garbage collection
may be triggered at almost any point in time, we must make sure that
even if we are inside C code that holds the last references to a given
value, the Garbage Collection will know that this value still is in
use. In the c_ex_x_open_display_v1 C function in our
example, we first introduce a variable of type value named
block, which is made visible to the GC. Then, we allocate
a block which will have a header tag (Abstract_tag) that
tells the GC that this region of memory does not hold a string, an
array or tuple, or any other kind of value OCaml may want to deal with
in some special way: it will just contain "raw data", and it will
always be up to our code to interpret this in the proper way. We
generally are requested to use the Field and
Store_field macros to retrieve and store data from the
fields of a block, if these fields contain ML values. For C data (like
pointers) stored in a custom block, we must not use
Store_field, as this would tell the GC to keep track of
the value in that slot and regard it as a ML value that has to be
scanned - with disastrous consequences. Hence, we introduce our own
Store_c_field macro to make explicit that we do want to
store a value without making the GC worry about it. This macro
actually is implemented in a somewhat hackish way and perhaps should
rather be part of official OCaml, but at the time of this writing, it
is not.

Every entry in a block will be large enough to hold one ML value, and
in our example, we implicitly use the slightly dangerous assumption
that a value is at least as large as a pointer. (However, this
actually seems to be true on all platforms.) Note that if we were to
construct a float array from within C on a 32-bit system, we would
have to allocate a block with twice as many value slots
as the number of entries of our floatingpoint array, and we should use
the Double_field and Store_double_field
macros to access them. Here, our payload data is either a C
Display pointer, or a null pointer, denoting an invalid
display. The other C implementations of functions operating on X
displays will extract that value and handle it in an appropriate
way. Note that the example code sometimes is a bit more verbose than
strictly necessary. This is just to clarify the general structure.

Finalization and Exceptions

Quite often, when we wrap C-controlled resources in such a way, we
may want to provide means that the resource is freed automatically
should it become garbage. Quite in general, it is a good idea not to
rely on such GC finalization as the solitary mechanism to free
resources but to at least provide explicit de-allocation
means. Depending on the resource, we may even want to consider it an
error that should be reported if it ever ends up being
GC-finalized. One way to implement finalization would be to use the
Gc.finalise function on x_display values and
register x_close_display as a finalizer and wrap up our
raw x_open_display function accordingly on the OCaml
side. We may also provide a finalizer written in C. This will be shown
in the next example. In addition, we will make sure that using
x_server_vendor on an invalid X display will raise a
special exception defined by us, which will provide both a
human-readable problem description and an OCaml tag telling us what
went wrong.

Let us briefly discuss what's new. First, we are now using
alloc_final() to allocate our custom-data blocks. We need
to allocate one extra value entry and use the second slot (having
index 1) for our data, as a pointer to the finalization function will
go into slot 0. Actually, this is not 100% true:
alloc_final() is just a legacy compatibility function for
the more general and also more flexible alloc_custom()
function that also allows us to specify other custom functions that
handle, say, serialization to strings, hashing, and comparison. None
of this really makes overly much sense on X display pointers, so we
just leave it at the more simplistic approach. The other two
parameters control how frequently the GC is called after allocating
entities of this type. The first value is a measure of the relative
amount of resources used by this entity (we just use 1 here), the
second one is a measure for how many of these we allow the system to
allocate before GC has to be called in order to try to reclaim some
that may have become garbage. These "ten allocations per GC" can be
seen at the end of our transcript.

Concerning exception handling, we first have to introduce an
exception on the OCaml side, and register this with a special name, so
that we can locate it from within C by that name. Then, we build the
argument tuple - we may have used alloc with the special
tag denoting a tuple, but alloc_tuple is a more
convenient shorthand. Note that the parentheses in the exception
definition are mandatory!. If we were to do some less
sophisticated exception handling, we might prefer using the much
simpler raise_with_string instead.

C Callbacks

As we have seen, the functional way to think about the
decomposition of an algorithmic problem into sub-tasks which are
realized by specialized little helper functions is very
natural. Consequently, we find such a style of programming also in
some C libraries. One typical application is the specification of
callback C handlers. Suppose that we have a C library that provides an
opaque C structure called - say - "animal", for which we
can register a callback C function that is called whenever this animal
has to make a sound. Typically, the C library implementor will have
thought about the problem that the library user might need more
flexibility than what can be provided by registering just a C
function. With our background in functional programming, we now may
see it that way: C "functions" are not functions, but just routines. A
proper function is a piece of code specifying what to do, plus maybe
some extra contextual information. We generally just called this a
"proper function" so far. If one wants to emphasize the role of the
contextual data grouped together with the function, this is sometimes
called a "closure". A parameter to a callback-setting function that
provides such context which is passed to the registered callback once
it is executed then is a "closure parameter". This sounds a bit
convoluted - but we will see an example soon.

When we wrap up such a library for OCaml, we will usually want to
match the spirit of the original callback approach as closely as
possible. On the OCaml side, we will not require a closure parameter
to the callback function, as OCaml already has proper functions. On
the other hand, we have to use a callback wrapper on the C side that
uses the closure parameter to pass around the OCaml function.

The details are best studied by looking at an example. Note that
one should pay very close attention here - making things smooth for
the user of our code will require some tricky magic under the hood.

SOURCES = c_examples.mli c_examples.ml c_examples_impl.c animal.c
# Note: if we had this in a proper installed shared object library,
# we would give linker option flags as in the libX11 example instead.

The key idea is: We have to provide a C data pointer when we
register our callback function. Actually, what we want to pass here is
the ML function, but this may be moved around in memory by the GC. So,
we have to pass a pointer to a C memory region holding the ML
function. But then, we have to make sure that the GC will recognize
this C-allocated memory as a position that holds a ML value, which
should be treated as a root for heap scanning, and modified if the
value is moved around. Therefore, we have to
register_global_root it - and unregister and free it once
we get rid of the object for which we registered the callback. The
reader should take his time to think this through.

Actually, this unfortunately means that we will encounter an ugly
problem if the callback function we register is a closure containing
the object for which we registered the callback. The reason is that
the callback-holding object will be responsible for removing the
global GC root in its finalizer - but if we make this object
accessible through that global GC root, it never will be finalized. In
other words, if we write code like the following, this means
asking for trouble:

XXX Actually, if I run this
e.g. as (test 1000), I get a segfault, but strangely, the address
reported for the callback function always is the same. This should not
be possible! Something is wrong with this discussion. Have to
investigate!)

This brings us about as far as we want (or have to) go with our
discussion of the C interface. Let us conclude this lesson with this
final pearl: a module providing functionality that allows us to
specify a (Real d-dimensional Space -> Real k-dimensional Space)
function in the form of a string containing C code. This will then be
put into a C source code file, compiled, dynamically loaded, and
linked from within OCaml, so that we can c_register a
string and in the end obtain a very fast float array -> float
array OCaml function implementing this computation!
Documentation of the (very small) ML interface is still lacking,
especially concerning re-use of the output vector, and error checking
should be improved (catching compiler errors as well as making sure
the wrapped function comes with array length bounds
checks). Nevertheless, this is a closed example showing many of the
techniques we have discussed here, plus a few new ones, in particular:
using the module system to define a weak hash table and using this to
keep an overview over the still-in-use C-wrapped functions and using
dynamical loading of C libraries.

One very simple way to demonstrate this is to use the shell command
"ulimit -v 200000" to artificially limit virtual memory
size to 200000 KB and then start GCL. If we try to define a vector of
20 million values, this would require about 80 MB of RAM. If we
provide the initial value, all entries will point to the same
entity. But look what happens if we start putting different numbers
into different places: