Mac OS X Q&A

Four articles tracking Mac OS X's progress have come and gone in this series, …

Filenames and Kernels

Q: Does Mac OS X support long filenames? Are filenames case-sensitive?

A: Mac OS X's default volume format, HFS+, supports 255 character Unicode filenames, but does not support case-sensitive filename comparisons. Both of these features have some interrelated caveats.

HFS+'s Unicode filename support is somewhat spotty due to the fact that the Unicode specification was still evolving when the HFS+ specification was finalized. In particular, HFS+'s rules for Unicode string comparison are not the same as more modern Unicode implementations. Compounding this problem is the fact that HFS+ compares filenames in a case-insensitive fashion, but the Unicode standard does not strictly define upper and lower case equivalence. Thus, HFS+ must formulate its own rules for case-folding?rules that may not be compatible with other Unicode implementations, or with a user's expectations. See the relevant documentation for the gory details.

The upshot is that an HFS+ volume accessed via Mac OS X does indeed support "long filenames", but you can't have two files named "Readme" and "README" in the same directory, and there may be a some odd sorting behavior when you start using Unicode characters outside the usual ASCII subset.

Of course, Mac OS X supports more than just the HFS+ volume format. UFS is another Apple-supported installation option. UFS is based on the 4.4BSD Fast File System (FFS) and supports 255 character filenames and case-sensitive filename comparisons.

The native Mac OS X UI (open/save dialog boxes, the Finder, etc.) supports filenames up to the maximum length allowed by the volume being manipulated at the time. Classic Mac OS applications continue to be limited to 31 character filenames, even when working with HFS+ volumes, since they're still (essentially) running in plain old Mac OS 9.

The Kernel

Q: I have read that Mac OS X uses the BSD kernel, but I have also read that it uses the Mach kernel. Which is it?

A: The Mach microkernel is the foundation of Mac OS X. (A brief introduction to Mach is available at Stepwise.) Mach provides basic hardware abstraction, memory allocation, process management (including threads), and interprocess communication. But Mach by itself is not a complete kernel. It does not provide device i/o, networking, file system support, high-level APIs suitable for application development, or many other services associated with a full-fledged operating system kernel.

Mach is designed to host these missing services on top of its platform-independent base functionality. The most common source for these services has historically been BSD Unix. The BSD subsystem implements the full set of APIs and services provided by BSD Unix, but it leverages Mach to perform memory allocation, process management, and so on. In an operating system with a real BSD kernel (e.g. FreeBSD, NetBSD, OpenBSD), the BSD kernel does all this heavy lifting itself.

When the BSD subsystem is implemented as a user-level process running on top of Mach, Mach is said to be a "pure microkernel." If the BSD subsystem crashes, Mach will not be affected. Many embedded systems use Mach (or some other microkernel) in this fashion to ensure maximum stability in even the most extreme situations. But pure microkernels have several drawbacks, the biggest of which is the performance hit incurred by the necessary (but computationally expensive) message passing between Mach and the user-level subsystem process.

Most modern desktop and server operating systems (including Windows 2000) use what is often called a "modified microkernel" architecture. Mac OS X does this as well. Instead of running as a user-level process on top of Mach, Mac OS X's BSD subsystem runs in kernel mode in the same address space as Mach itself. Most message passing between Mach and BSD is eliminated in this situation; the BSD subsystem can interact with Mach via normal function calls.

It's important to note that Mach's native kernel interfaces have not been broken by this "incorporation" of the BSD subsystem. They remain just as accessible to other subsystems as they would be in a pure microkernel implementation. This is important in Mac OS X because of the wide variety of subsystems implemented on top of Mach (and, by extension, on top of BSD): Cocoa, Carbon, the Java Virtual Machine, and even Classic.

Yes, this deviation from a pure microkernel means that Mach is vulnerable to BSD subsystem crashes. But if BSD goes down, Mac OS X is basically hosed anyway. There's no point in gloating that "Mach is still running just fine" when the machine is totally unusable (no device i/o, no networking == dead). This type of stability is important in embedded systems, but personal computers are useless without higher level services. And the performance boost from this arrangement is substantial.

As should be clear by this point, although Mach is the "real" kernel doing the low-level heavy-lifting, the overall "flavor" of the Mac OS X kernel is BSD. In Mac OS X, BSD provides the process model (process IDs, signals, and so on), basic security policies such as user IDs and permissions, threading support (POSIX threads), BSD sockets networking, and BSD kernel APIs.

Read that last sentence again, and note that this has nothing to do with a command line, a Unix-like directory structure, programs like "ls", "tar", or "gzip", or any of the other things often associated with BSD. Those things can be removed from Mac OS X without damage, and indeed may not be part of the default install. But "BSD" cannot be removed from Mac OS X any more than Mach can. Every process in Mac OS X is a BSD process. Every file has BSD-style permissions (yes, even on HFS+ volumes). Every thread is, at its core, a POSIX thread.

Summary:

This pervasive BSD flavor may be why there is confusion about Mac OS X being "based on the BSD kernel." It's not; it's based on Mach. But Mach by itself does not a kernel make. The "officially blessed subsystem" running on top of Mach is BSD, but it is not "the BSD kernel." The BSD kernel manages its own memory and processes and does its own hardware abstraction. The BSD subsystem in Mac OS X uses Mach's implementation of these services. Finally, the BSD subsystem exists in the same address space as Mach for performance reasons, but this incorporation does not break any of Mach's modular interfaces.

It's a complex answer to a simple question, and it requires some knowledge of operating system design to fully comprehend. That's probably why there's so much confusion on this topic.

John Siracusa / John Siracusa has a B.S. in Computer Engineering from Boston University. He has been a Mac user since 1984, a Unix geek since 1993, and is a professional web developer and freelance technology writer.