arm backend

> Is the ARM backend (ocamlopt) usable and actively maintained?
In brief: yes modulo ABI issues; yes.
In more details:
In OCaml 3.11 and earlier, the ARM port uses an old ABI (software
conventions on using registers, etc). This ABI corresponds to the
"arm" port of Debian; I don't know about other Linux distros.
OCaml/ARM works like a charm on platforms supported by Debian/arm.
I use it on a Linksys NSLU2.
However, most embedded Linux/ARM platforms use a more recent,
incompatible ABI called EABI. In Debian Lenny, it's available under
the name "armel".
I recently revised the OCaml/ARM port to adapt it to EABI and to
software floating-point emulation. You can find it in the CVS trunk,
and testing and feedback is most welcome. It works fine under Debian
Lenny "armel".
Floating-point performance is better than with the old port, because
the latter used floating-point instructions that are no longer
available on contemporary ARM processors and therefore had to be
trapped and emulated by the kernel. In contrast, with soft
floating-point, emulation is performed in user land by C library
functions, which can also take advantage of vector float
instructions if the processor supports them.
Concerning the iPhone, it is not supported out of the box by 3.11 nor
by the CVS trunk code. For 3.11, several patches have been mentioned
on this list; it would be great if someone with iPhone development
experience could combine them and publish a unified patch.
For the CVS trunk code, it seems we are getting close: as far as I
could see, MacOSX/ARM uses EABI plus Apple's "signature" approach to
dynamic linking. However, I haven't yet succeeded in running Apple's
iPhone SDK compilers from the command-line. (It looks like one of
those Microsoft SDK's that assume everyone is developing from the
vendor-supplied IDE...) Again, I welcome feedback and patches from
iPhone development experts.

memory profiling

> what is the best option to do memory profiling with ocaml?
> Is there a patch of ocaml-memprof for 3.11.0? What about
> objsize?
If you want to use objsize with ocaml 3.11, you should get
the new version of objsize -- 0.12:
http://forge.ocamlcore.org/projects/objsize/
OCaml has new heap since 3.11, and old versions won't work.
objsize has an unresolved make-related problem with building
on msvc/win32 (object/library file extensions, to be specific), so
one should build objsize manually on msvc (not a hard thing.
but I'll fix it in near future).

OCaml 3.10.2 on iPhone unified patch

I'm attaching a unified patch that contains all the changes
necessary to cross-compile OCaml for the iPhone.
The patch should create a file named Makefile.xarm, which
contains a build script to build the cross compiler. This script
essentially automates the instructions given by Toshiyuki Maeda
at the following URL:
http://web.yl.is.s.u-tokyo.ac.jp/~tosh/ocaml-on-iphone
This unified patch contains all his patches plus some fixes of
our own. The result works with Apple's official iPhone SDK, and
we have used it to build a working iPhone application.
To build the cross-compiler:
a. Must be on an Intel Mac with Apple's iPhone SDK installed.
b. Create two full copies of the OCaml 3.10.2 release, in two
sibling directories. One must be named OCamlBase (used to
hold a native build of OCaml). The other can be named
anything; I'll use the name OCamlXARM.
$ wget http://caml.inria.fr/pub/distrib/ocaml-3.10/ocaml-3.10.2.tar.gz
$ tar xzf ocaml-3.10.2.tar.gz
$ mv ocaml-3.10.2 OCamlBase
$ cp -R OCamlBase OCamlXARM
c. Patch the OCamlXARM tree:
$ cd OCamlXARM
$ patch -p0 < ocamlxarm0.1.patch
d. Run the build process:
$ make xarm-build
This will take a while. It builds two copies of the OCaml
release, using the native copy to plug native components into
the cross-compiler at a few critical spots. (For more
information, see the URL above.)
e. There is also a make rule for installing. The default target
is /usr/local/ocamlxarm. To change this, you'll need to edit
Makefile.xarm. When the target is set as desired:
# make install
I just verified that the build works on my system. Let me know of
any difficulties.
(Please follow the archive link to download the patch.)

Custom blocks and finalization

> > Let's have a look at byterun/globroots.c:
> [snip]
> > So... how could caml_remove_global_root() actually trigger a heap GC?
> It certainly doesn't, but it obviously manipulates runtime
> datastructures which may not necessarily be in a state during
> finalization where this is safe. I'd have to study the runtime code
> in detail to learn more, but maybe the OCaml team can clarify this
> issue more quickly?
I foresee absolutely no problems with registering/unregistering global
roots from a C finalizer. As the manual states, the big no-no in
C functions attached to custom blocks is allocating in the heap,
either directly or via a callback into Caml or by releasing the global
lock. Within a finalizer, you should also refrain from raising an
exception, as this would leave the GC is a bizarre state. But global
roots operations are OK.
As for not using the CAMLparam, etc macros in these functions: since
these functions must not allocate, the macros are certainly not
needed. I don't think that actually using them could cause a problem,
but that would be silly anyway, so don't use them in this context.

Ocamlopt x86-32 and SSE2

Dmitry Bely wrote:
> I see. Why I asked this: trying to improve floating-point performance
> on 32-bit x86 platform I have merged floating-point SSE2 code
> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
> architecture. It also inlines sqrt() via -ffast-math flag and slightly
> optimizes emit_float_test (usually eliminates an extra jump) -
> features that are missed in the original amd64 code generator.
You just passed black belt in OCaml compiler hacking :-)
> Is this of any interest to anybody?
I'm definitely interested in the potential improvements to the amd64
code generator.
Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
arithmetic does improve performance and fit ocamlopt's compilation
model much better than the current x87 float arithmetic, which is a
bit of a hack. Several options can be considered:
1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
current "i386" port.
2- Declare pre-SSE2 processors obsolete and convert the current
"i386" port to always use SSE2 float arithmetic.
3- Support both x87 and SSE2 float arithmetic within the same i386
port, with a command-line option to activate SSE2, like gcc does.
I'm really not keen on approach 1. We have too many ports (and
their variants for Windows/MSVC) already. Moreover, I suspect
packagers would stick to the i386 port for compatibility with old
hardware, and most casual users would, too, out of lazyness, so this
hypothetical "ia32sse2" port would receive little testing.
Approach 2 is tempting for me because it would simplify the x86-32
code generator and remove some historical cruft. The issue is that it
demands a processor that implements SSE2. For a list of processors, see
http://en.wikipedia.org/wiki/SSE2
As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
as well as almost all notebooks since 2006. That should be OK for
professional users (it's nearly impossible to purchase maintenance
beyond 3 years, anyway) and serious hobbyists. However, packagers are
going to be very unhappy: Debian still lists i486 as its bottom line;
for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
processor", meaning Pentium III. All these processors lack SSE2
support. Only MacOS X is SSE2-compatible from scratch.
Approach 3 is probably the best from a user's point of view. But it's
going to complicate the code generator: the x87 cruft would still be
there, and new cruft would need to be added to support SSE2. Code
compiled with the SSE2 flag could link with code compiled without,
provided the SSE2 registers are not used for parameter and result
passing. But as Dmitry observed, this is already the case in the
current ocamlopt compiler.
Jean-Marc Eber:
> > But again, having better floating point performance (and
> > predictable behaviour, compared to the bytecode version) would be a
> > big plus for some applications.
Dmitry Bely:
> Don't quite understand what is "predictable behavior" - any generator
> should conform to specs. In my tests x87 and SSE2 backends show the
> same results (otherwise it would be called a bug).
You haven't tested enough :-). The x87 backend keeps some intermediate
results in 80-bit float format, while the SSE2 backend (as well as all
other backends and the bytecode interpreter) compute everything in
64-bit format. See David Monniaux's excellent tutorial:
http://hal.archives-ouvertes.fr/hal-00128124/en/
Computing intermediate results in extended precision has pros and
cons, but my understanding is that the cons slightly outweigh the pros.

Editor's note:

This thread contains many more messages, please follow the link archive if you
want to dig deeper in the subject.

Call for translators of the Unix system programming course

We are launching an effort to translate Xavier Leroy and Didier Rémy's course
on Unix system programming in Objective Caml [1].
We are looking for one translator and one proofreader for each of the seven
main chapters. To facilitate the task of proof readers and get the best
translation preference will be given to native english speaking contributors.
If you are interested in participating please email me privately with the
following information : chapter you'd like to translate/proofread and whether
your are a native english speaker or not.
The work of the translators will be published online as html and pdf documents
at this adress [2] under a Creative Commons BY-NC-SA french license [3]. Note
that the non commercial aspect of the license doesn't preclude a subsequent
publication of the result as a (non free as in wine) book. However in that
case an agreement would be sought between the authors, the translators and the
publisher.
Thanks for your help,
Daniel
[1] http://pauillac.inria.fr/~remy/poly/system/camlunix/index.html
[2] http://ocamlunix.forge.ocamlcore.org/
[3] http://creativecommons.org/licenses/by-nc-sa/2.0/fr/deed.en