The concern for efficient and easy-to-use inter-process communication is prevalent among microkernel-based operating systems. Genode has always taken an unorthodox stance on this subject by disregarding the time-tested standard solution of using an IDL compiler in favour of sticking to raw C++ mechanisms. The new version 11.05 of the OS framework takes another leap by introducing a brand new API for implementing procedure calls across process boundaries, facilitating type safety and ease of use, yet still not relying on external tools. Furthermore, the platform support for the Fiasco.OC kernel has been extended to the complete feature set of the framework. The most significant new features are L4Linux (on Fiasco.OC), an experimental integration of GDB, ARM RealView PBX device drivers, and device I/O support for the MicroBlaze platform.

"I don't see why messages have go through the kernel. For me, the best approach for interprocess communication on the same machine is to have two processes share memory, and then when a process A wants to send a message to another process B, then process A simply allocates a buffer from the shared memory, and then informs process B about the message via a semaphore. Then process B reads the message, copies it into a private memory, and then checks it.

In this way, there is no need for context swapping; the kernel need not be invoked at all.

(1) A single memory shared by everything is a bottleneck in multiprocessor systems. Caches don't solve this problem, they only hide it behind the cache coherency protocol. "

Sharing always have bottlenecks (fundamentally from the speed of light). Sharing memory with caching-aware semantics is the fastest communication a standard processor can have, even pure message passing like the basic QNX primitives still use the same shared memory mechanism.

(2) "Going through the kernel" is only slow if you make it slow.

Like on x86? Depending on the processor and the kernel/user design a pure enter/exit of kernel mode can take some 1000s of clocks (including stalls due to cache/TLB evictions). Add the overhead of the operation. (I am aware that pure null-operations are considerably faster however real code have real overheads)

This means that user-level communications with shared memory can in many cases do spin-locks with lower overheads than using any kernel primitives. Spin+fallback to kernel synchronization is very effective.