OCFS2/DesignDocs/Reflink

Reference Counted Links in ocfs2

Introduction

This series of design documents describes the generic REFLINK operation and the ocfs2-specific implementation thereof. The REFLINK operation creates a new inode that shares the data extents of a source inode in a Copy-on-Write (CoW) fashion.

The design is the outcome of an exploration into inode snapshots started in early October 2008. In the end, the REFLINK operation is not limited to snapshots and enables a number of use cases.

Snapshots on ocfs2

ocfs2 is a general purpose extent-based shared-disk cluster filesystem. Some filesystems, like ZFS, btrfs, and WAFL, have a single tree that describes the entire filesystem. This makes snapshotting the volume, or even a subtree, pretty easy. Because ocfs2 uses block-based addressing, it does not have a single starting point to describe the entire filesystem. Implementing a snapshot system for the entire volume or a directory subtree is impractical in the ocfs2 code.

This isn't so bad, though, because ocfs2 does just fine with storage assisted snapshots. This is where the underlying storage can snapshot the LUN underneath the filesystem. High-end storage already can do this, and it works just fine with ocfs2. On the low end, LVM2 provides a snapshot capability. Once ocfs2 support for clvmd goes production, this will be usable as well.

Single File Snapshots

LUN-based snapshots require snapping the entire LUN (obviously). This is impractical when one wants to save a single file or a small group of files. For a filesystem, this means snapping inodes.

The REFLINK Operation

The design of inode snapshots turned into the generic REFLINK operation. These design documents describe the generic operation, the ocfs2 structures needed to support it, and some use cases.