I'm pleased (and much relieved) to announce a new version of the yypkg
mingw-builds. Together with yypkg, this provides a package manager both
on Linux and on Windows with 70 packages of libraries and binaries built
both for i686 and x86_64. Everything is easily fully reproducible.
Release highlights include:
- New x64 toolchain (both as a cross- and native toolchain)
- Improved native toolchains
- OCaml cross-compiler, only targets i686-w64-mingw32 (see bug 5737)
- More packages
- Updated packages
- Installers for Windows
You can find details, documentationa and downloads on the new website:
http://yypkg.org/mingw-builds/
PS: you can also read/write comments and vote on reddit if you feel like
it's worth it:
http://www.reddit.com/r/programming/comments/1b8egi/yypkg_mingwbuilds_windowslinux_package_manager_70/

Master-slave architecture behind an ocsigen server.

I'm developping an ocsigen website doing some scientific calculations.
Up to now, the calculations were done in the same process that runs
the server. In order to gain in scalability (and maybe stability too),
I would like to run those calculations in a separate (pool of)
process(es). As this is a pretty typical setup, I guess quite a few
people have already done that. So I'd like to hear some suggestions on
what library to use in this particular context. It seems to me that
the release library [1] should do the job and is lwt-friendly, but
there are maybe other good options?
Thanks for any hint, cheers!
Philippe.
[1] https://github.com/andrenth/release

Martin Jambon suggested:

I wrote and used a library called Nproc about a year ago. It lets you
create (Nproc.create) a pool of N processes, to which you can submit
(Nproc.submit) computations of any type quasi-magically - just make
sure any big environment required for the computation is not copied
with each closure that you send to the workers. The submodule
Nproc.Full provides a more advanced interface that lets each worker
process have its own local environment.
https://github.com/MyLifeLabs/nproc
I haven't used Nproc in a while but it was working fine and should
still work.

Philippe Veber then said and Alain Frisch replied:

> nproc meets exactly my needs: a simple lwt-friendly interface to
> dispatch function calls on a pool of processes that run on the same
> machine. I have only one concern, that should probably be discussed on
> the ocsigen list, that is I wonder if it is okay to fork the process
> running the ocsigen server. I think I remember warnings on having parent
> and children processes sharing connections/channels but it's really not
> clear to me.
FWIW, LexiFi uses an architecture quite close to this for our
application. The main process manages the GUI and dispatches
computations tasks to external processes. Some points to be noted:
- Since this is a Windows application, we cannot rely on fork.
Instead, we restart the application (Sys.argv.(0)), with specific
command-line flag, captured by the library in charge of managing
computations. This is done by calling a special function in this
library; the function does nothing in the main process and in the
sub-processes, it starts the special mode and never returns. This
gives a chance to the main application to do some global
initialization common to the main and sub processes (for instance, we
dynlink external plugins in this initialization phase).
- Computation functions are registered as global values. Registration
returns an opaque handle which can be used to call such a function. We
don't rely on marshaling closures.
- The GUI process actually spawns a single sub-process (the
Scheduler), which itself manages more worker sub-sub-processes (with a
maximal number of workers). Currently, we don't do very clever
scheduling based on task priorities, but this could easily be added.
- An external computation can spawn sub-computations (by applying a
parallel "map" to a list) either synchronously (direct style) or
asynchronously (by providing a continuation function, which will be
applied to the list of results, maybe in a different process). In both
cases, this is done by sending those tasks to the Scheduler. The
Scheduler dispatches computation tasks to available workers. In the
synchronous parallel map, the caller runs an inner event loop to
communicate with the Scheduler (and it only accepts sub-tasks created
by itself or one of its descendants).
- Top-level external computations can be stopped by the main process
(e.g. on user request). Concretely, this kills all workers currently
working on that task or one of its sub-tasks.
- In addition to sending back the final results, computations can
report progress to their caller and more intermediate results. This is
useful to show a progress bar/status and partial results in the GUI
before the end of the entire computation.
- Communication between processes is done by exchanging marshaled
"variants" (a tagged representation of OCaml values, generated
automatically using our runtime types). Since we can attach special
variantizers/devariantizers to specific types, this gives a chance to
customize how some values have to be exchanged between processes (e.g.
values relying on internal hash-consing are treated specially to
recreate the maximal sharing in the sub-process).
- Concretely, the communication between processes is done through
queues of messages implemented with shared memory. (This component was
developed by Fabrice Le Fessant and OCamlPro.) Large computation
arguments or results (above a certain size) are stored on the file
system, to avoid having to keep them in RAM for too long (if all
workers are busy, the computation might wait for some time being
started).
- The API supports easily distributing computation tasks to several
machines. We have done some experiments with using our application's
database to dispatch computations, but we don't use it in production.

Anil Madhavapeddy then asked and Alain Frisch replied:

> Are all of the messages through these queues persistent, or just the larger
> ones that are too big to fit in the shared memory segment, and are they
> always point-to-point streams?
>
> We've got a similar need in Xen/Mirage for shared memory communication and
> queues, and have been breaking them out into standalone libs such as:
>
> https://github.com/djs55/shared-memory-ring
>
> ...which is ABI-compatible with the existing Xen shared memory interfaces,
> and also an OCaml version of the transport-agnostic API sketched out in:
> http://anil.recoil.org/papers/2012-resolve-fable.pdf
>
> The missing link currently is the persistent queuing service, but we're
> investigating the options here (ocamlmq looks rather nice).
The messages always go through shared memory queues (non persistent),
but their payload is offloaded to the file system (in temporary files)
when it is too large. There is no real persistence, though, because the
temporary file is not self-describing (it is not sufficient to restart a
computation after a process failure, for instance). (The distribution
of computations through a database is closer to real persistence.)
Queues are unidirectional point-to-point streams between two processes
(we use a pair of such queues for between the main process and the
scheduler, and between the scheduler and each worker process).
The relevant part of the API is:
========================================================
type t
(** The type of point-to-point shared memory queues. *)
val create: ?max_size:int -> unit -> t
(** A queue is created in one process with [create], passed by
name (using [id]) to a single another process which can call
[from_id]. *)
val from_id: ?max_size:int -> string -> t
(** Access a queue created by another process given its unique
name. *)
val id: t -> string
(** Globally (system-wide) unique name attached to the queue. *)
val close: t -> unit
exception CannotGrow
exception BrokenPipe
val send: t -> string -> unit
(** Non-blocking operation.
Raises [BrokenPipe] if the reader has disconnected.
Raises [CannotGrow] if the maximum size of the buffer has been
reached. *)
val read: t -> string option
(** Non-blocking operation.
Returns [None] if no message is available.
Raises [BrokenPipe] if the writer has disconnected. *)
========================================================
If there is some interest for it, maybe Fabrice could release the code
with an open source license?
One thing which is not supported by this library is a notification
mechanism to inform the other side of the queue that messages are
available. For now, we simulate that by pinging the processes
stdin/stdout descriptors. In the scheduler and the worker, we use
select to monitor them (there is probably a big cost of doing so,
especially considering how select is emulated under Windows). In the
GUI process, we use standard .Net process monitoring facilities to
inject callbacks in the main GUI thread. (The OCaml side of our
application in mono-threaded, and we use a few external native or .Net
threads to monitor system conditions.)

Gerd Stolpmann also replied:

Interesting that there are now other shared memory implementations for
OCaml. Note that there are a number of them in Ocamlnet, with some
specialities not yet mentioned. There is the Netcamlbox library
providing message boxes of limited size for exchanging OCaml values
directly. That means the value is copied to the shared memory block by
the sender, and the receiver can pick it up there without copying it
again. Sender and receiver can map the memory at different addresses
(the copy procedure invoked by the sender takes care of possible
offsets, so that that Netcamlbox also allows the communication between
processes that don't have a fork relation). There is no need for
marshalling the value.
http://projects.camlcity.org/projects/dl/ocamlnet-3.6.3/doc/html-main/Netcamlbox.html
Going even beyond that, Netmulticore implements an "ancient" heap in
shared memory (like Richard's Ancient lib, but with more options).
This heap is organized like OCaml's major heap, and there is even a GC
implementation for it. There are a number of data structures (arrays,
hash tables, queues, buffers) which are aware of residing in shared
memory. For synchronization there are mutexes, semaphores and
condition variables. So far the values to manipulate are already in
shared memory, programming with Netmulticore feels a lot like
programming with multi-threading. In practice, however, you need to
frequently copy values in and out, so it is not exactly as convenient.
For Netmulticore, all processes must map the shared memory to the same
address (easy with "fork").
http://projects.camlcity.org/projects/dl/ocamlnet-3.6.3/doc/html-main/Intro.html#netmulticorehttp://projects.camlcity.org/projects/dl/ocamlnet-3.6.3/doc/html-main/Netmcore_tut.html
> The missing link currently is the persistent queuing service, but
> we're investigating the options here (ocamlmq looks rather nice).
There is also Netamqp, which can be used together with RabbitMQ.
http://projects.camlcity.org/projects/netamqp.html

Gerd Stolpmann also replied to the initial question:

Well, I don't know whether this is an option for Ocsigen users, but
Ocamlnet includes fairly good multiprocessing support. You can run
servers that dynamically start subprocesses on demand. Look for Netplex:
http://projects.camlcity.org/projects/dl/ocamlnet-3.6.3/doc/html-main/Intro.html#netplex
I've no good recipe, though, how to plug in service processors that base
on lwt (well, there is an adaptor in Ocamlnet for lwt - Uw_lwt - but I
wouldn't know what to do on the Ocsigen side, but maybe worth
exploring).
Ocamlnet also includes other mechanisms that are generally interesting
for compute stuff, namely Netmulticore for exploiting several cores on
the same machine with fast shared memory architecture, and RPC for
distributing computations in a network. Both are extensions of Netplex,
so it is easy to integrate into a single program.

Philippe Veber then asked and Gerd Stolpmann replied:

> Thanks for your input Gerd! As I understand it, your suggestion is to have
> an RPC server (based on netplex) doing the actual calculations. That RPC
> server would be called by the ocsigen server when needed (the ocsigen
> server is the client in that scheme). So in that schema, only the RPC call
> should be lwt-friendly. Digging in ocamlnet documentation, it seems that
> could be achieved using [Rpc_simple_client.call] wrapped inside a
> [Lwt_preemptive.detach].
I guess RPC would be most convenient here - it supports a server mode
where the child processes accept the new connections (btw, if you
don't want to deal with the RPC encoding stuff (i.e. XDR), just
marshal the OCaml value as string, and use RPC functions that are
declared as string->string).
A sample program would the "finder" service here:
http://projects.camlcity.org/projects/dl/ocamlnet-3.6.3/examples/rpc/finder