Monday, October 31, 2011

The next major release of FreeBSD, version 8, was intended to be an "evolutional" release with few exciting changes. Of course by now it is obvious this will be another in a series of releases with groundbreaking changes.

This page will document changes that will be included in FreeBSD 8, including those that might end up being committed to earlier branches. In other words, it describes differences between 7.0 and 8.0, no matter what happens to the versions in between.

Everyone is encouraged to download a snapshotCD image and try all the new features (as well as the old ones). Developers are very interested in bug reports. Note that FreeBSD 8.0 is not released yet and both the snapshots and the default source trees have debugging enabled by default (which results in dramatic slowdowns so don't benchmark them without removing the debugging options).

Overall system / architectural changes

INET-less / IPv6-only kernel

As IPv6 development and deployment is progressing, at its own pace, there is interest in making it possible to run a FreeBSD system as IPv6-only (instead of the default configuration which is dual-hosted IPv4+IPv6).

Historically, BSD is the progenitor of all TCP/IP implementations and the IPv4 code in FreeBSD was sprawled across the network layers of the kernel, from device drivers to the higher socket layers. A recent initiative aims at fixing the layering violations in preparation to, at first, build a kernel without INET (i.e. IPv4) support, then build an IPv6-only kernel. This change involves large kernel subsystems such as the firewalls, bridging, NFS and others.

CLANG / LLVM compiler

As the GCC compiler suite was relicensed under GPLv3 after the 4.2 release, and the GPLv3 is a big dissapointment for some users of BSD systems (mostly commercial users who have no-gplv3-beyond-company-doors policy), having an alternative, non-GPL3 compiler for the base system has become highly desireable. Currently, the overall consensus is that GCC 4.3 will not be imported into the base system (the same goes for other GPLv3 code).

The LLVM and CLANG projects together offer a full BSD-licenesed C/C++ compiler infrastructure that is, performance and feature-wise close to, or better than GCC. The LLVM is the backend and the CLANG is the front-end part of the infrastructure.

Recent development has shown that not only is it possible to start using LLVM+CLANG right away, it is also very stable. The probability of replacing GCC for the base system in the near future is high, though it probably won't happen by default for the 8.x series.

Note that this mostly affects the base system. There is too much third party software that depends on GCC to completely replace it.

Parallel port builds

The ports infrastructure is the part of the FreeBSD operating system that's responsible for making thousands (actually close to 20,000) of third party packages available to FreeBSD users. It enables everyone to install custom software from either source code (the traditional and preffered way) or from analogous binary packages.

The port infrastructure for source builds has been enhanced to allow parallel builds of individual ports. In the age of multi-core CPUs this means package build times will be drastically decreased. By default, all available logical CPUs will be used.

This enhancement is not tied to the 8.0 release and is available now on all recent versions of FreeBSD. Port dependancy graphs will still be built serially (i.e. only one port at a time will be built, but each individual port will be built in parallel).

Kernel & low level improvements

Better handling of mounted device removals

Panics on "hot" removal of devices with file systems mounted from them (the canonical example is the removal of USB flash memory keys while the file system was mounted) were the most commonly reported problem from end-users. New development, funded by the FreeBSD foundation, has solved this issue.

Jails v2

The jails subsystem has been greatly enhanced and updated to support modern FreeBSD features. In addition to the support for multiple IP addresses per jail (or none), support for IPv6 and SCTP has been implemented, jails can be nested hierarhically and jails can now be restricted to certain CPUs. Jails are especially powerful when combined with ZFS, where system administrators can be allowed to create and manage their own file systems within the jails.

Xen dom-U support

Xen support has been integrated into FreeBSD, allowing it to be used as a 32-bit guest operating system on recent versions of Xen dom0 (not as a host!). A target for 8.0 is to make FreeBSD ready to be used on Amazon EC2. The project needs testing and sponsorship.

New USB stack

The USB stack received a significant overhaul and the new code fixes many standing problems. Some of the new features are full support for split transactions, isochronous transactions, removed dependency on Giant (MPSAFE), a new API and many more. See the SVN message for details.

The new USB stack will use old drivers' and kernel modules' names to increase backward compatibility.

MPSAFE TTY

The TTY layer is the traditional Unix interface to system users, providing them with interactive sessions to run shells, etc. The current TTY layer in FreeBSD is for the most part the traditional BSD TTY, which is integrated with the drivers and other layers in a way that, though efficient, makes it hard to maintain and extend. The initiative to rewrite the TTY layer aims to make it a true abstraction layer, operating on behalf of both sides of TTY. In addition, it will remove the TTY from the Giant lock, which will eliminate problems with lags and skippy user interface behaviour in the console and X.Org.

Kernel memory limit on AMD64 increased

Some modern features (of which the most notable currently is ZFS) require a large amount of kernel memory (this has nothing to do with traditional disk caches or the amount of memory visible to the system). Up to now, it was only possible to allocate up to 2 GB forkmem_max, which is becoming a bit cramped. This limit has recently been increased to 512 GB. Together with backpressure improvements for the ARC, this will make the users of ZFS happy.

procstat(1): A process inspection utility

Status: Committed to -CURRENTWill appear in 8.0: sureAuthor: Robert WatsonWeb:announcement

procstat combines functionality from the now-deprecated procfs(4) and adds several new functionalities. Some of the data procstat can provide are: process' command line arguments, file descriptor information, stacks of the kernel threads in the process, security credentials information from the process, thread information and virtual memory mappings. This is utility is mostly useful for debugging.

TextDumps: gathering information after kernel panic

Status: Committed to -CURRENT, MFCedWill appear in 8.0: sureAuthor: Robert WatsonWeb:Q&A on textdumps

The usual thing that happens after a kernel panic is a kernel memory dump, either full or (in 7.0 and later) a "minidump". The new "textdump" feature doesn't store the actual kernel memory dump, but extracts commonly needed information from it, stores it into a tar archive of text files, and deletes the dump file. This significantly reduces the size requirements of collecting such information, speeds up development, and enables people to collect debugging information after a crash without kernel developer experience.

ULE 3.0: New version of the SMP-optimized scheduler

Evolution of the ULE scheduler resulted in support for fine-grained CPU affinity calculations, taking into account the physical topology of the CPUs (caches, cores, sockets) and much improved support for binding threads to CPUs. This results in additional functionalities (opens up the possibility of assigning individual CPUs to jails) and noticeable performance improvements.

Superpages

Most general-purpose processors provide support for memory pages of large sizes, calledsuperpages. Superpages enable each entry in the translation lookaside buffer (TLB) to map a large physical memory region into a virtual address space. This dramatically increases TLB coverage, reduces TLB misses, and promises performance improvements for many applications. However, supporting superpages poses several challenges to the operating system, in terms of superpage allocation and promotion tradeoffs, fragmentation control, etc. The performance benefits are substantial, often exceeding 30%; these benefits are sustained even under stressful workload scenarios.

While they can be used on most x86 CPUs, benchmarking has shown that their greatest benefits are visible on quad-core and newer CPUs.

DTrace

DTrace is a tool and a language developed by Sun Microsystems to help debugging and profiling operating systems. It can aggregate information from different parts of kernel (userland tracing is not yet implemented) and analyze them in a ways that's meaningful to the user.

Networking improvements

802.11s D3.03 wireless mesh networking

A wireless mesh network, sometimes called WMN, is a wireless network using a mesh topology instead of more typical AP-client topology. These networks are often seen as special type of ad-hoc networks since there's no central node that will break connectivity (in contrast with common wireless networks where there's a central Access Point). 802.11s is an amendment to the 802.11-2007 wireless standard that describes how a mesh network should operate on top of the existing 802.11 MAC.

VirtNet / VIMAGE / Imunes / Network stack virtualization

The network stack virtualization project aims at extending the FreeBSD kernel to maintain multiple independent instances of networking state. This will allow for complete networking independence between jails on a system, including giving each jail its own firewall, virtual network interfaces, rate limiting, routing tables, and IPSEC configuration.

VIMAGE+Jails will be experimental in 8.0; the system might not work as advertised, especially with regards to security.

Zero-copy BPF

BPF is Berkeley Packet Filter, facility used to capture raw network packets from the lower layers of the network stack according to user-defined filters and forward them to an application, as well as insert raw packets to the network stack.

This improvement to BPF reduces the number of memory copy operations between the kernel and the application which improves performance in some cases.

Kernel NFS locking support

NFS lock manager in kernel improves performance and behaviour of NFS locking (used to synchronize file access on remote machines). New features include multithreaded operation, deadlock detection, and transparent interaction with local file locks on the server.

NFSv4 support

NFSv4 is a major overhaul of the NFS protocol and brings many new features like a stateful protocol, performance improvements and stronger security (ACLs, strong authentication). Until recently, NFSv4 support in FreeBSD was partial (client-only) and somewhat unstable. New development aims to complete this support.

The introduced NFSv4 infrastructure also replaces the old NFSv2 and NFSv3 servers and clients with the new ones.

Storage subsystems' improvements

Experimental new driver for AHCI

The new driver, present but not enabled by default in 8.0, supports native AHCI via the CAM (common access method for storage) system. AHCI drives are manipulated by camcontrol and support for new features like NCQ has been integrated.

gvinum 2

gvinum is a logical volume manager based on and compatible with vinum, the FreeBSD's long-standing and practically traditional volume manager. Its features include JBOD, RAID 0, RAID 1 and RAID 5 modes of combining storage devices into higher level volumes, and due to the new version's integration with GEOM it can use and be used by other GEOM devices and classes.

Gvinum 2 is significantly restructured version of gvinum and fixes many long-standing problems. The work done on gvinum makes it more usable and production ready, while maintaining compatibility with older versions. Gvinum exists in parallel with other GEOM classes like gmirror, gstripe and others.

Boot support for GPT partitions

Support for booting from GPT partitions has been committed to -CURRENT. This support includes the boot sector and loader that enable common i386 machines with a generic BIOS to boot from GPT-partitioned drives.

bsdlabel gets extended to 20 partitions

bsdlabel is (finally!) extended to support more than 8 partitions. The new limit of 20 partitions comes from the number of entries that fit in a single sector.

To make use of this change, GEOM_PART needs to be used instead of GEOM_BSD (this is default in 8.0 but will not work with older kernels). Old utilities like "bsdlabel" will not work with GEOM_PART; the new gpart utility must be used instead.

Security

ProPolice SSP (stack-smashing protection)

ProPolice helps prevent exploits that use stack-based buffer overflows by setting a random integer (called the "canary") in the stack right before the return address. It is set in the function's prologue and verified during the epilogue; if it has changed, then a buffer overflow has occured and the program commits suicide by killing itself with SIGABRT (or panic() in case it's the kernel). Both userland and kernel may be protected.

Other changes

The following is a list of smaller and / or more obscure changes that nevertheless deserve a special mention since they will be of interest to certain users:

User-controllable CPU/IRQ binding (jhb)

User-controllable CPU-thread binding with support for CPU sets (jeffr)