How I Got Started with the Btrfs File System for Oracle Linux

What Margaret Bierman discovered about the Btrfs file system in Oracle Linux, including an introduction to its basic administrative tasks and commands.

Published July 2012

Introduction

This article describes the basic capabilities that I discovered while becoming familiar with the Btrfs file system in Oracle Linux, plus the instructions I used to create a file system, verify its size, create subdirectories, and perform other basic administrative tasks. A second article describes how I use the advanced capabilities of the Btrfs file system.

About the Btrfs File System

The Oracle Linux operating system provides advanced methods for storing and organizing data on disk storage systems, such as the ext3, ext4, and XFS file systems, the Oracle Cluster File System 2 (OCFS2) clustered file system, and the next-generation Btrfs file system. Each has its own characteristics and feature sets, allowing administrators to select the one that best fits data storage needs and requirements.

Our research is based on the version of Btrfs available in Oracle Linux 6 with the Unbreakable Enterprise Kernel Release 2 (Version 2.6.39).

File System Design Goals and History

Created by Chris Mason at Oracle, the initial design for Btrfs has its roots in a presentation by Ohad Rodeh about copy-on-write friendly B-tree implementations at the USENIX FAST '07 conference. Mason based the Btrfs design on his experience developing the ReiserFS file system (extent-based storage, packing of small files) and the idea to store data and metadata in B-tree structures. After several months of internal development, Btrfs was presented to the Linux community in June 2007. Since then, Oracle engineers have continued to maintain and advance its development. They work in close collaboration with many contributors from the Linux community, including engineers from Linux distributors, such as Red Hat and SUSE, and other companies, such as Dreamhost, Fujitsu, HP, IBM, and Intel. Today, Btrfs is included in the mainline Linux kernel and is gaining popularity through several Linux distributions, including Oracle Linux.

Getting to Know Btrfs

When researching Btrfs, I discovered it has a wealth of functionality built into it.

Scalability and volume management. First and foremost, Btrfs is a scalable, 64-bit file system that can span large volumes to provide files and file systems as large as 16 exabytes. Included is functionality to manage multiple underlying storage devices. This functionality is similar to that traditionally provided by logical volume management tools. For example, Btrfs allows a file system to span multiple devices and present a single logical address space. Unlike most file systems, Btrfs even makes it easy to shrink the size of a single logical volume. In addition, devices can be added or removed while file systems remain online. When a device is removed, the extents stored on it are redistributed to other devices in the file system. Because these features are built into the file system, we think they have better insight into underlying storage and can optimize access patterns and data distribution.

Write methodology and access. Btrfs utilizes a B-tree structure to store data types and point to information stored on disk. Unlike other file systems, it does not journal transactions. As a result, writes are performed once, removing the limitations that result from journal size and reducing wear caused by repetitive writing of the same section of hard disk or solid-state disk (SSD). A copy-on-write technique ensures blocks and extents are not overwritten in place. They always are copied to a new location first. Extended attributes and POSIX-compliant Access Control Lists (ACLs) limit the access and manipulation of file system contents by users and applications.

Tunables. Btrfs provides minimal user tuning to guard against misuse. One interesting option is the -o autodefrag mount option that enables auto-defragmentation. Another is the ability to disable copy-on-write via the nocow option, which can help to minimize fragmentation, particularly for files with sequential access requirements, such as databases and streaming media. In this mode, file blocks are overwritten in place, similar to traditional file systems.

Data Integrity

Btrfs uses a number of built-in features to ensure data integrity.

Redundant configurations. Btrfs supports device mirroring and RAID configurations to improve data survivability and ease data reconstruction. By default, Btrfs mirrors metadata across two devices and stripes data across all devices underlying the file system. Even on a single device, metadata is duplicated and maintained in two locations for redundancy.

Checksums. Btrfs generates checksums for data and metadata blocks to preserve the integrity of data against corruption. Checksums are verified each time a data block is read from disk. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device—if mirroring or RAID techniques are in use. If a good copy is found, it is returned instead and the bad block is corrected.

Fault isolation and checksum algorithms. Btrfs provides fault isolation by storing metadata separately from user data, and it provides additional protection through CRCs. The CRCs are stored in a B-tree that is separate from the data to provide fault isolation.

Rebuild times. As aptly noted by Mason, Btrfs rebuilds involve only the blocks actively used by the file system. As drive capacities increase, this is a considerable advantage over traditional file system and RAID protection mechanisms. In traditional approaches, the time to rebuild high-capacity drives can be measured in days, during which time there is no protection.

Encryption. Btrfs does not provide built-in encryption functionality yet. An encrypted Btrfs file system can be created on top of the dm_crypt disk encryption subsystem and Linux Unified Key Setup (LUKS) layer, which support a variety of encryption standards. However, this approach disables some of the capabilities and advantages of using Btrfs on raw block devices, such as automatic solid-state disk support and detection.

Space Savings

Btrfs supports compression on a mount basis. It can be enabled at any time after the subvolume is created. Once enabled, Btrfs automatically tries to compress files using LZO or zlib compression. (Other compression algorithms, such as Snappy and LZ4, are in development.) If a file does not compress well, it is marked as not compressible and written to disk uncompressed. In this case, Btrfs does not make additional compression attempts. A force-compress option is available that tries to compress new writes in case newly added file content can be compressed.

Performance Enhancements

Btrfs provides functionality and device support designed to improve file system performance characteristics.

Solid-state disk support. Flash memory, such as the memory cards we put in our digital cameras and the removable USB drives we use to back up and transport data from one machine to another, is low-cost, nonvolatile computer memory that can be electrically erased and reprogrammed. In the enterprise, Flash technology is used in solid-state disk drives (SSDs) to increase application performance. Btrfs is SSD-aware, avoids unnecessary seek optimization, and aggressively sends writes in clusters, even if they are from unrelated files. This results in larger write operations and faster write throughput.

Online defragmentation. Over the years, we have noticed that file systems which experience a great deal of churn that fragments available capacity tend to deliver lower performance. Btrfs provides a mount option (-o autodefrag) that enables an auto-defragmentation helper. When a block is copied and written to disk, the auto-defragmentation helper marks that portion of the file for defragmentation and hands it off to another thread, enabling fragmentation to be reduced automatically in the background. This capability can provide significant benefit to small database workloads, browser caches, and similar workloads.

Subvolumes, Snapshots, and Seed Devices

The copy-on-write nature of Btrfs makes it easy for the file system to provide several features that facilitate the replication, migration, backup, and restoration of information.

Subvolumes. The linchpin of Btrfs, subvolumes are essentially named B-trees that hold files and directories. Subvolumes can optionally have quotas and are mounted as if they were disks.

Snapshots. In Btrfs, a snapshot starts as a copy of a subvolume taken at a given point in time. In essence, they are clones of a subvolume. When left unchanged, snapshots faithfully record the state of the subvolume at the time of the snapshot. Because snapshots are writable, they can be used as evolving clones of the original subvolume. You can create snapshots almost instantly, and initially they consume virtually no additional disk space. (The modest exception is a small amount of additional metadata.) This capability is useful when it is important to keep copies of older versions of a file hierarchy or move them to other systems for backup or restore operations. Individual files can be cloned using the cp -reflink command, which does for files what snapshots do for volumes.

Seed devices. Btrfs seed devices provide a read-only foundation to which multiple read/write file systems can point. All local updates go to these descendents. When the bulk of the data remains unchanged from the original seed file, there is considerable space savings. This can be considered another form of cloning.

Backup and restore. Btrfs does not provide built-in support for creating backups. A best practice is to create a snapshot of a volume and use traditional backup utilities to copy data off the file system. To help, a Btrfs feature is available (btrfs subvolume find-new) that identifies the files that have changed on a given subvolume. We find using this feature to be faster than traversing the entire file system with the find -mtime command to locate changed files.

Ext file system conversion. Btrfs supports the in-place conversion of existing, ext3 and ext4 file systems. The original ext3 or ext4 file system metadata is kept in a snapshot, so the conversion can be reversed if necessary. Obviously, if a converted file system is modified heavily or over a protracted period of time, the ability to go back could have limited practical value. If the file is reverted after only a short time, this can be a very useful feature. Once you have determined that you do not intend to revert, deleting the snapshot frees disk space.

Administrative Interface

Btrfs is managed primarily using command-line utilities. The only dedicated GUI tools available focus on operating system installation and basic support capabilities. Access to the advanced features of the file system generally is not provided. Table 1 lists the key Btrfs administrative commands.

Table 1. Btrfs Administrative Commands

Task

Btrfs

Initialize a file system

mkfs.btrfs

Administer an existing file system

btrfs options

How to Create and Set Up a Btrfs File System

I found getting started with Btrfs to be very simple. To create file systems, you need to use the sudo command (or otherwise become the root user) and have unused disk devices attached to the system. The first step is to create a Btrfs file system using the mkfs.brtfs command. For example, I created a 10 GB file system that spans two physical 5 GB disks (dev/sdb and /dev/sdc), using default file system configuration parameters.

The next step is to make the file system visible to the operating system so that it can be used. I used the standard Oracle Linux mount command to mount the file system on /mnt. Note that only the first device that comprises the file system needs to be mounted.

I wanted to determine how best to keep data safe. First, I created a snapshot of the subvolume using the btrfs subvolume snapshot command and verified its existence and contents using the standard Oracle Linux ls command. I named the snapshot subbasefoo-20120501.

Since snapshots persist until removed, deleting a file in the subbasefoo subvolume does not release any storage space. Keep in mind that disk space cannot be freed until all snapshots that reference the files in question are removed.

Snapshots are just subvolumes. As a result, all of the same commands apply. To emulate an "undo" facility, always create a new snapshot for experimentation. If you like the result, simply delete the previous generation snapshot. If you do not like the result, just delete the experimental version. It's a handy feature.

When it is necessary to have a zero-space copy of a single file, use the reflink option of the cp command. For example, the following commands verify the size of the subbasefoo subvolume and clone the file named rantest.tst (creating the file clonetest.tst). Subsequent use of the df command shows that the file clone does not consume additional disk space.

Final Thoughts

My research into Btrfs revealed that it addresses long-standing deficiencies found in conventional file systems. Better yet, setting up and using a Btrfs file system is quick and easy, particularly if default configuration parameters are used. These defaults provide a reasonable amount of data protection and improved functionality—and little or no effort is required compared to the default file system. Many advanced features are in place to help improve data integrity and reliability, unify volume management, increase device utilization, and more. In our view, it is the best file system to use when deploying Oracle Linux platforms. As always, the choice is up to you.

See Also

The following resources provide more information on the capabilities of Btrfs:

About the Authors

Lenz Grimmer is a member of the Oracle Linux product management team. He has been involved in Linux and Open Source Software since 1995.

Margaret Bierman is a senior writer and trainer specializing in the research and development of technical marketing collateral for high-tech companies. Prior to writing, she worked as a software engineer on optical storage systems, specializing in the development of cross-platform file systems, hierarchical storage management systems, device drivers, and controller firmware. Margaret was also heavily involved in related standards committees, as well as training ISVs and helping them implement solutions. She received a B.S. in computational mathematics from Rensselaer Polytechnic Institute.