Services

Kernel Log: Coming in 2.6.37 (Part 2) - File systems

by Thorsten Leemhuis

With the next kernel version, Ext4 will reach new levels of performance and use a trick to increase its storage media formatting speed. Other new features include a discard function that is interesting for slow-trimming SSDs, the "Rados Block Device" for cluster devices, bug fixes and optimisations to Btrfs.

The Kernel Log takes the recent release of RC4 of kernel version 2.6.37, whose final release is expected around the end of the year, as an opportunity to continue its "Coming in 2.6.37" mini series and describe the file systems improvements. Part 1 of the series described the changes in the graphics hardware area and in the coming weeks, further articles will discuss the changes to the kernel's architecture code, drivers and surrounding infrastructure.

Ext family

Together with new userland tools, the "Lazy Inode Table Initialization" feature should considerably speed up the creation of Ext4 file systems, because the areas required for the inode tables will no longer explicitly be wiped during formatting.

Due to the changes incorporated in kernel 2.6.37, Ext4 now scales better and comes closer in throughput to XFS.
Source: http://thunk.org/tytso/blog/2010/11/01/
The Ext4 code now co-operates more closely with the Block Layer – this is intended to improve file system speed, improve scalability and reduce CPU loads. In his main Git-Pull request, and in greater detail in a blog posting, main Ext3 developer Ted "tytso" Ts'o writes that, on a test system with 49 processor cores, these and several previously implemented changes tripled the throughput while reducing the CPU load by a factor of 3 to 4. Ts'o said that the performance of Ext4 is now close to that of XFS, and that further performance improvements to Ext4 are being planned.

Occasional cleaning

The VFS now offers an ioctl called "FITRIM", which enables such programs as fstrim that are soon to be integrated into the util-linux-ng tool collection to instruct the kernel's file system code to search for free areas and inform the storage device – for instance via ATA_TRIM. Currently, however, only Ext4 has implemented the new interface for this feature, which is also called "Batched Discard" (1). Notifications about free areas are important for Thin Provisioning and improve the performance and life span of SSDs.

As the whole file system must be searched for free areas in the process, a Batched Discard can take some time to complete. How long the storage device subsequently needs to process the commands doesn't matter that much anymore. As the developer of the changes explained, when introducing the patches a few months ago, some SSDs tend to take a long time. As a result, data throughput can be slowed down considerably if the kernel informs the storage device about newly available areas every time individual files are deleted, a feature the kernel has been capable of since version 2.6.33.

On his home page the developer has made available various programs for testing the discard speed. The page also offers test results for a number of products, although there are no manufacturer details or model names.