The conceptual problems with microkenels

You've come to this page because you've asked a question similar to the
following:

What are the conceptual problems with microkernels?

This is the Frequently Given Answer to such questions.

One of the problems with the microkernel idea in practice is that in
many ways it is employed to ignore problems by simply redefining the
software to exclude them. One defines "kernel" to be something smaller
than before, and considers problems to be gone simply because they no
longer occur within that portion of the overall system that one has
labelled "the kernel". "The system doesn't crash" because "the code that
crashes isn't executing in the kernel".

Anyone with in depth experience of operating systems knows that certain
classes of application mode code problems can bring down a whole system.
This is as true of microkernel operating systems as it is of a fatal
X server bug causing the loss of all X clients, or of
the CSRSS Backspace Bug
causing Windows NT 4 to fail. The system as a whole is
more than the kernel, and this is especially true of
microkernel operating systems — ironically making the problem
that much more acute on such operating systems.

On microkernel operating systems there are almost invariably one,
sometimes several, server processes whose failure will render the entire
system inoperable, even though "the kernel" is unaffected in its
operation.
(The authentication server on the Hurd
is one such process, for example.) Whilst a fatal bug in some system code
cannot accidentally overwrite the kernel when it resides in an application
process, the fact that in a microkernel operating system the kernel
does far less means that even though the kernel continues to
function, there's not much that can actually be done with the
system after the server process has crashed.

It's often stated that message-passing is one of the problems with the
microkernel idea in practice. In fact, this isn't actually a problem that
is specific to microkernels. Plenty of monolithic operating systems pass
messages around, too. Windows NT's entire I/O subsystem, for example, is
based upon message passing. (I/O Request Packets are messages.) The
STREAMS mechanism in AT&T Unix is a message-passing mechanism, as is
the sockets layer in the various BSDs. Most if not all of the criticisms
levelled at message-passing in microkernels can be equally levelled at
monolithic kernels as well. The processing of "mbufs" in the BSDs has
well-known design issues with respect the number of copy operations, for
example. Operating systems where GUI programs use the X protocol have
performance issues relating to message passing, data marshalling, and
context switching between client and server. Message passing is a general
design issue in many operating systems, not just in microkernel ones.

What the development of mainstream operating systems over the past
decade and a half has shown is that the "microkernel/monolithic
kernel" distinction is in many ways a distraction from the real
engineering issues to address in operating systems design, which
actually apply to both microkernels and monolithic kernels alike. A
small selection of these issues, in no particular order, are:

separating mutually untrusting portions of code into separate protection
domains, such as separate processes running under the aegises of different
user accounts (This applies just as much to applications softwares as it
does to system softwares. Consider the design of qmail, for
example. Consider how Windows NT version 6 is finally learning
from the design of qmail in the area of resources used by
service processes. And consider the security ramifications of all of the
major servers in the Hurd having ports to each of the others.)

high cohesion and low coupling of modules

interface definitions, including future-proofing APIs (Consider the
implications of the GreXXXX() API for display drivers in the
move from Windows NT 3 to Windows NT 4, for example.)

choice and tradeoffs between client-server designs where resources
are local and hidden and designs that involve autonomous distributed
access to accessible global resources (Consider the evolution of the
graphics subsystem from Windows NT version 3.5, through version 4, to
version 6, for example. And consider why the Hurd has to have a
/hurd/magic server.)