Linux 2 6 29

Linux 2.6.29 kernel released on 23 March, 2009.

Summary: Linux 2.6.29 adds kernel based graphic mode setting, WiMAX support, Access Point support in the wifi stack, the inclusion of the btrfs and squashfs filesystems, ecryptfs filename encryption, ext4 no journaling mode, ocfs2 metadata checksums, a more scalable RCU implementation, filesystem freeze support, swap management in the memory controller, many new drivers and many other improvements.

1. Prominent features (the cool stuff)

1.1. Kernel Modesetting

When we talk about "mode setting", we mean setting up things like the screen resolution and depth mode, in other words, configuring whatever it's necessary in the graphics card to get it ready to display things on the screen. This may look easy to implement, but the graphics people say it's harder than it seems to design and implement right (multihead setups, hotplug, etc), which is why it has taken so much time. To start with, mode setting implies allocating memory from the graphics card, which means that before doing modesetting it was necessary to have the GEM memory manager ready and merged in the main tree, which did not happen until the previous release, 2.6.28.

Doing modesetting right is much harder if you consider how the typical Linux graphic setups works today. There are several drivers sharing the same piece of hardware (your graphics card): the kernel VGA driver, the kernel framebuffer drivers, the kernel DRM drivers, the userspace X drivers, the userspace DRI drivers, and other userspace drivers (e.g. svgalib). This is a very bad idea. For example, the X.org driver is a full implementation of all what is needed to make your graphics card work in 2D, and when you start up X.org all is fine. But when you switch from X.org to a VT console (using Ctrl+Alt+F1), the X.org driver has to stop handling the graphics card, because the control needs to be passed to another driver: the fb console driver. So the X.org driver saves the current state of the display, the graphics card is reset and then it is handled to the fb driver, which has to reinitialize completely the card again (that's why there's a noticeable pause in the switch), even if the resolution in X and the console is the same. When you switch back to X.org, the card needs to be reconfigured again. This is a hairy, bug-prone mess, and it's difficult to make it working right in some cases; for example resuming of suspended/hibernated systems is more difficult with this design because the X.org driver lives in userspace and suspend/hibernation is (must be) transparent for him, so it is not aware that it needs to reset the hardware before continuing drawing things after resuming, needing a userspace/firmware helper that some times may not work correctly (black screen, hanged system).

Enter modesetting, and those problems go away. Modesetting centralizes the mode setting code in the kernel drivers. While this may look like it's a lot of code for a kernel, it's actually the contrary: there's a single piece of code in charge of the modesetting, which means that the whole graphic stack gets smaller. And because there's only one implementation, and there's more code sharing, the quality and robustness is increased. Besides, mode setting is a task that really belongs to the kernel drivers (it's how every other operating systems, including some proprietary Unixes, have always done it). But all that is just a small part of the benefits: Suspend and resume are more likely to work because all the work will be done by the kernel driver just like it is done with any other hardware device. X.org doesn't need the root privileges anymore (they were needed before only because the old drivers needed direct hardware access to implement the drivers), making possible to run X.org as non-root, which is a huge security improvement, and moves X.org back to what it is supposed to do: drawing things, and not messing with hardware. It is also possible to print kernel oopses (or even a Windows-like BSOD ) to the screen if the kernel crashes while running under X. And if all this wasn't enough, you also get a FB console that runs on top of modesetting and GEM, making the old FB drivers completely obsolete while preserving their functionality at the same time. And with modesetting it isn't needed to reset the hardware when switching from X.org to a VT, and when switching, if the resolution is the same, there's no need to change anything, making possible to do flicker-free graphical boots and fast user switching.

However, trying modesetting in this release is not easy. In the kernel side, only the Intel driver is getting modesetting support in this release (support for other drivers is being worked on and will be merged in future releases). In the X.org side, it's even more difficult. Because when the kernel modesetting support is enabled, it is REQUIRED that the X.org driver also has modesetting support. If you enable kernel modesetting and you don't have a modesetting-enabled driver, X.org will NOT be able to work in any way and it even may crash your machine. There's no way to workaround this, except disabling kernel modesetting (running a modesetting enabled X.org driver in a modesetting disabled kernel is allowed). Right now, only the Intel driver seems to have a stable release with modesetting support, alpha support is being worked on for other drivers.

1.2. Btrfs

Btrfs is a new filesystem developed from scratch following the design principles of filesystems like ZFS, WAFL, etc. It was created by Chris Mason, an Oracle engineer who worked several years in Reiserfs v3. It is expected to become the replacement of Ext filesystems as the most used Linux filesystems. More information about btrfs can be found in the btrfs wiki page

Btrfs is under HEAVY development, which means that it is UNSTABLE , and while it has got more stable in the latest months you should not assume it's safe to use it. It's getting included in the same way ext4dev was merged to improve the development. So it's strongly recommended not to use it for any other uses than testing, benchmarking and developing. The plan of the btrfs team right now is to make a 1.0 release. The disk format is not expected to change anymore (but it will if a critical bug is found).

1.3. Squashfs

Squashfs is a highly compressed read-only filesystem that is well know for being used in the Live-CDs of the most common Linux distributions and embedded distributions for some routers. It uses zlib compression (lzma will be added in the future) to compress both files, inodes and directories. Inodes in the system are very small and all blocks are packed to minimise data overhead. Block sizes greater than 4K are supported up to a maximum of 1 Mbytes (default block size 128K).

SquashFS 4.0 supports 64 bit filesystems and files (larger than 4GB), full uid/gid information, hard links and timestamps. It is intended for general read-only filesystem use, for archival use (i.e. in cases where a .tar.gz file may be used), and in embedded systems where low overhead is needed. Further information and userspace tools (needed to create the filesystem) are available from http://squashfs.sourceforge.net.

1.4. Support of 4096 CPUs

Many parts of the Linux core code support such number of CPUs, but there were problems with the cpumask-specific code, which is the part of the system used to represent all the CPUs in the system. With 4096 CPUs that structure became too big, causing stack overflows, and had some performance problems, which made impossible to default distro kernels to 4096 CPUs, because systems with a lower number of CPUs would have slower performance. The goal with this release has been to be able to support for as many CPUs as possible with no disadvantages for smaller machines. The code been changed to use pointers for that structure instead of using the stack and some scalability problems have been resolved, making possible to run machines with 4096 CPUs and make easier to support even more in the future.

1.5. "Tree RCU": scalable classic RCU

This feature fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs - Classic RCU was designed for machines with 16-32 CPUs. "Tree RCU" applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. Although this feature creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU.

1.6. WiMAX

2.6.29 includes support for WiMAX, a telecommunication technology that provides wireless transmission of data using a variety of transmission modes. It provides up to 75 Mbit/s symmetric broadband speed without the need for cables. The technology is based on the IEEE 802.16 standard (also called Broadband Wireless Access). The stack has been provided by Intel, and it includes a driver for the Intel Wireless WiMAX/Wi-Fi Link 5x50 USB/SDIO devices.

1.8. eCryptfs filename encryption

eCryptfs implements transparent encryption of the contents of a file. In this release, it also can encrypt the file name via a passphrase-derived mount-wide Filename Encryption Key (FNEK) specified as a mount parameter. Each encrypted filename has a fixed prefix indicating that eCryptfs should try to decrypt the filename. When eCryptfs encounters this prefix, it decodes the filename into a tag 70 packet and then decrypts the packet contents using the FNEK, setting the filename to the decrypted filename. Both unencrypted and encrypted filenames can reside in the same lower filesystem. Because filename encryption expands the length of the filename during the encoding stage, eCryptfs will not properly handle filenames that are already near the maximum filename length.

1.9. Filesystem freeze

Linux doesn't have a freeze feature which suspends write requests. So, it's not possible to take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. Many commercial filesystems (e.g. VxFS) have the freeze feature and it would be used to get the consistent backup. This feature implements the ioctls of the freeze feature.

1.10. Memory controller swap management and other improvements

This feature adds a swap management feature to memory resource controller. Previously, the memory controller couldn't control the swap used by the tasks in a cgroup, allowing a single process to exhaust all of the swap. Now, you can limit mem+swap usage per cgroup. However, it adds some overhead and memory consumption, so it's configurable.

Another features added to the memory controller in this release are hierarchy support, per-cgroup swappiness, improved per-cgroup reclaim stats and better oom handling.

1.11. Ext4 no journal mode

Since Ext3 was born, there was people that never wanted to use journaling, for various reasons. In this release, Ext4 adds support for a mode that doesn't use journaling. The result is a small performance increase (see the commit link for benchmark data) compared with Ext4, but it is also a noticeable improvement over Ext2.

1.14. Tuz replaces Tux for this release

The emblematic Tux mascot is replaced, for the 2.6.29 release, by Tuz, a disguised Tasmanian Devil. This logo change is a support for this endangered species. (commit)

2. Various core changes

Scheduler

Idle cputime accounting: The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work (commit), (commit)

Extend range of /sys/devices/system/cpu/sched_mc_power_savings. Currently the sched_mc/smt_power_savings variable is a boolean, which either enables or disables topology based power savings. This patch extends the behaviour of the variable from boolean to multivalued, such that based on the value, we decide how aggressively do we want to perform powersavings balance at appropriate sched domain based on topology (commit)