Hello.
I encountered a rather weird issue. A binding of mine works fine when
bundled as a .cmxa, but fails when bundled as a .cma. I'm running a
Linux Debian amd64.
I've tracked down the issue to the following point: it seems that when
the BSS (uninitialised data section) of libmonetdb5.so is dynamically
loaded, it doesn't get initialised to 0. And the code in libmonetdb5.so
relies on the fact that BSS gets initialised to 0 when dynamically loaded.
To reproduce my problem, do the following:
You need to have the following Debian packages installed for MonetDB.
libmonetdb-client-dev
libmonetdb-client1
libmonetdb-dev-dbg
libmonetdb1-dbg
libmonetdb5-server-dev-dbg
libmonetdb5-server5-dbg
libmonetdb5-sql-dev
libmonetdb5-sql2
monetdb-client
monetdb5-server-dbg
The *-dbg packages are packages I've changed and recompiled with the -g
option. They are available from my website:
http://yziquel.homelinux.org/debian/pool/main/m/
The key signing the repo is located at
http://yziquel.homelinux.org/debian/yziquel-debian-packages.asc
and you just have to do
cat yziquel-debian-packages.asc | sudo apt-key add -
and include the following lines:
> deb http://yziquel.homelinux.org/debian stable main
> deb-src http://yziquel.homelinux.org/debian stable main
> deb http://yziquel.homelinux.org/debian testing main
> deb-src http://yziquel.homelinux.org/debian testing main
> deb http://yziquel.homelinux.org/debian unstable main
> deb-src http://yziquel.homelinux.org/debian unstable main
The rest of the MonetDB packages can be found here:
http://monetdb.cwi.nl/downloads/Debian/
and the monetdb5 binding is here:
http://yziquel.homelinux.org/gitweb/?p=ocaml-monetdb5.git;a=tree
(click on snapshot to download one).
Now here is why I believe that the BSS is not properly initialised. The
code in which I have my segfault is the following one, function findBox.
Line 330 of:
http://monetdb.cvs.sourceforge.net/viewvc/monetdb/MonetDB5/src/mal/mal_box.mx?revision=1.100&view=markup
There is this line:
> if (box[i] != NULL && idcmp(name, box[i]->name) == 0) {
I've followed machine code instructions step by step there, with ddd.
In native code, box[i] == NULL. Evaluation stops there (i.e. box[i] !=
NULL is false). Everything is perfect.
In bytecode, box[i] != NULL because BSS is not initialised to 0... And
it then tries to access box[i]->name, and segfaults.
For the record, you have:
> 211 typedef struct BOX {
> 212 MT_Lock lock; /* provide exclusive access */
> 213 str name;
> 214 MalBlkPtr sym;
> 215 MalStkPtr val;
> 216 int dirty; /* don't save if it hasn't been changed */
> 217 } *Box, BoxRecord;
and
> 263 #define MAXSPACES 64 /* >MAXCLIENTS+ max modules !! */
> 264 Box box[MAXSPACES];
For the disassembled code, you can have a look at:
http://sourceforge.net/mailarchive/message.php?msg_name=4B3ED073.3050203%40citycable.ch
I've also tried running ltrace to see how dynamic loading happens for
the bytecode monetdb5.cma:
http://yziquel.homelinux.org/monetdb_sql.byte.ltrace
But it gives ma 95% of ocaml related lines, and the end is concerned
only with ml_monetdb_sql. I'd like to see how the 'box' symbol gets
loaded in BSS, but do not know how to do that.
So: is ocaml failing to initialise memory to 0 when limonetdb5.so is
dynamically loaded?
--
Guillaume Yziquel
http://yziquel.homelinux.org/