Book review: The Design and Implementation of the 4.3BSD UNIX Operating System

According to my photographs, I picked up this book in February of this year. With a 105 sections spread over 13 chapters I’ve been working through the book slowly at a section a day. Despite being a technical subject, the book does a very good job of explaining the operation system at a high level without becoming a study of the source code. There are snippets of source code & pseudo code to compliment the text and an extensive list of papers for reference at end of each chapter for those that wish to dig deeper.

Finished the design and implementation of 4.3BSD UNIX operating system book, now available for UNIBUS based multiuser system consultancy

I had previously attempted to complete the Minix book, Operating Systems: Design And Implementation but struggled with the extensive source reference. switching back and fourth between chapters or the requirement for a computer to view the source code was not a viable option. I took a chance on this book as used copies are available on Amazon for the cost of a postage which is less than a couple of pounds. The book is well written and enjoyable to read, while implementation details may not be completely applicable to modern BSD variants The fundamental details may still hold true in most cases if not providing a historical background around the technical challenges they faced at the time. What I liked with the Minix was that it provided lots of background to accommodated a beginner and get a reader up to speed though I much preferred the ability to read this book by itself without requiring access to the source code.

I found some of the details in the interprocess communication part a little unclear at times but enjoyed the filesystem and memory management chapters the most and the terminal handling chapter the least though I did learn of Berknet there, aswell as many other historical artefacts throughout the book, some of which I tweeted under the hashtag di43bsd.

Berknet is an obsolete batch-oriented network that was used to connect PDP-11 and VAX UNIX systems using 9600-baud serial lines. Due to the overhead of input processing in the standard line discipline, a special reduced-function network discipline was devised.

The 4.3BSD kernel is not partitioned into multiple processes. This was a basic design decision in the earliest versions of UNIX. The first two implementations by Ken Thompson had no memory mapping at all, and thus made no hardware-enforced distinction between user and kernel space. A message-passing system could have been implemented as readily as the actually implemented model of kernel and user processes. The latter was chosen for simplicity. And the early kernels were small. It has been largely the introduction of more and larger facilities (such as networking) into the kernel that has made their separation into user processes an attractive prospect — one that is being pursued in, for example, Mach.

The book breaks down the percentage of components in each category (such as headers) which are platform independent and platform specific. With a total of 48270 lines of platform independent code versus 68200 lines of platform specific code, the 4.3BSD kernel was largely targeted at the VAX.

From the details on the implementation of mmap() in the BSD memory management design decisions section it was interesting to read about virtual memory subsystems of old

The original virtual memory design was based on the assumption that computer memories were small and expensive, whereas disk were locally connected, fast, large, and inexpensive. Thus, the virtual-memory system was designed to be frugal with its use of memory at the expense of generating extra disk traffic.

It made me think of Mac OS X 10.4 (Tiger) as that still struggled with the same issue many years on which I have to suffer when building from pkgsrc. Despite having a system with 2GB of RAM, memory utilisation rarely goes above 512MB.

The idea of having to compile the system timezone in the kernel amused me though it was stated that with 4.3BSD Tahoe, support for the Olson timezone database that we are now familiar with was first added, allowing individual processes to select a set of rules.

I enjoyed the filesystem chapter as I learnt about the old berkley filesystem and the “new” which evolved into what we use today, the performance issues with the old filesystem due to the free list becoming scrambled with the age of the filesystem (in weeks), resulting in longer seek times and the amount of space wasted as a function of block size.

Although the old filesystem provided transfer rates of up to 175 Kbyte per second when it was first created, the scrambling of the free list caused this rate to deteriorate to an average of 30 Kbyte per second after a few weeks of moderate use.

The idea of being rotationally optimal to reduce seek times and implementing mechanisms to account for that was very interesting to read about.

To simplify the task of locating rotationally optimal blocks, the summary information for each cylinder group includes a count of the available blocks at different rotational positions. Eight rotational positions are distinguished, so the resolution of the summary information is 2 milliseconds for a 3600 revolution-per-minute-drive.

Though this is not so valid today with traditional spindle disks as there is not a 1:1 mapping between the physical location & logical representation of the blocks on disk.

The book is a bargain second hand and worth it for the BSD archeology.

Two months after the beginning of the first implementation of the UNIX operating system, there were two processes, one for each of the terminals of the PDP-7. At age 10 months, and still on the PDP-7, UNIX had many processes, the fork operation, and something like the wait system call. A process executed a new program by reading a new program in on top of itself. The PDP-11 system (first edition UNIX) saw the introduction of exec. All these systems allowed only one process in memory at a time. When PDP-11 with memory management (a KS-11) was obtained, the system was modified to permit several processes to remain in memory simultaneously, in order to reduce swapping. But this modification did not apply to multiprogramming, because disk I/O was synchronous. This state of affairs persisted into 1972 and the first PDP-11/45 system. True multiprogramming was finally introduced when the system was rewritten in C. Disk I/O for one process could then proceed while another process ran. The basic structure of process management in UNIX has not changed since that time.