Our Emperor Sponsors

Ceph: a scalable distributed storage system for Linux

Ceph is a scalable distributed storage system for Linux consisting of two main components. An object storage layer provides reliable, scalable, and high-performance parallel access to gigabytes to petabytes of data objects. A distributed file system is constructed on top of this object store, providing high-performance cache-coherent parallel access to a single shared file system namespace with POSIX semantics. This talk will focus, in turn, on both parts of the system: their architecture, implementation, and deployment. The intended audience is a mix of developers and system administrators.

The object store provides a generic, scalable cloud storage platform (much like Amazon S3) with advanced features like snapshots and distributed computation. The storage cluster is designed to be relatively self-managing: it data replication, failure recovery, and data migration (during cluster expansion or contraction) are handled semi-autonomously by the storage nodes comprising the cluster. A well-known data distribution function allows clients to calculate object locations within the cluster without consulting any central directory or index servers, providing fast, direct parallel access to data.

The store logically consists of some number of independent object pools, each providing an independent object namespace. Each pool has some associated level of (n-way) replication and placement constraints (e.g., affinity for a given class of storage nodes), which can be adjusted at any time. A simple computation infrastructure allows an administrator to dynamically load object "methods" into the cluster, extending the basic set of supported operations (read, write, truncate, remove, get/set xattr, etc.). For example, a large application hosting image content may load an image manipulation library, allowing applications to rotate, resize, or crop image objects on the storage nodes themselves without an over-the-net read/modify/write cycle.

The file system incorporates two interesting features. First, by maintaining "recursive accounting" information within the directory hierarchy, clients and trivially see how many files and how much data is contained by any directory (and its children) in the system. A recursive "mtime" similarly allows applications like backup software to quickly identify which portions of the hierarchy contain recent changes. Second, Ceph implements snapshots on arbitrary directories, without requiring the file system to be separated a priori into separate subvolumes. A simple interface (mkdir .snap/foo, rmdir .snap/foo) makes snapshots usable by individual, non-privileged users.

Ceph is licensed under a combination of the GPL and LGPL (version 2), and is actively working to merge the file system client into the mainline Linux kernel.

Sage Weil

Sage Weil designed Ceph as part of his PhD research in storage systemsat the University of California, Santa Cruz. Since graduating, he hascontinued to refine the system with the goal of providing a stablenext generation distributed file system for Linux. Prior to hisgraduate work, Sage helped found New Dream Network, the company behindDreamhost web hosting (dreamhost.com), who now supports a small teamof Ceph developers.