LFS: A Log Structured File System for Linux that Supports Snapshots

Introduction

A Log Structured File System (LFS) writes all the file
system data sequentially in a log-like structure. A log consists of a
series of segments, where each segment contains both data and inode
blocks. Traditional file systems like ext2 usually write inode
blocks at a fixed place on the disk, causing overhead due to disk seeks. A log
structured file system gathers a segment worth of data in memory and appends
the segment at the end of the log. This dramatically improves the write
performance while maintaining the same read performance. The sequential
nature of the log also helps in crash recovery as less checkpointing
information need to be stored. As the file system grows and files are
deleted, holes are created in the file system. A cleaner is required
to fill the holes and compact the file system allowing large extents of free
blocks to be found. The novel aspect in this work is the addition of
snapshotting capability to log-structured file systems. Currently, no Linux
file system offers this capability.

The primary objective of this work is to create a log-structured file system
for Linux that supports snapshots. A snapshot is a copy of the files taken at
a particular time. This is very similar to backup of a file system at a
particular time except that it is maintained within the same file system
without wasting any space. We believe that LFS is the ideal system for maintaining
snapshots, because its design renders naturally to maintain snapshots.

Motivation

Why do we need yet another file system for Linux? When LFS was
originally proposed, the idea of append-to-end-of-log to improve
write performance was novel and produced great results on various micro
benchmarks. However, later studies have shown that in
transaction processing environments LFS performs poorly due to the cleaner
overhead. We believe that advances in disk and memory technologies will help
log structured file systems. In the past decade, huge improvements are seen in
the disk and memory sizes for a typical machine. Increase in memory size
allows LFS to gather more segments in memory and with larger disk space, the
cleaner need not be run as often.

Currently, no Linux file system supports snapshots. Snapshots are usually
considered a special capability supported by network attached storage devices
(NASD) developed by companies like NetApp. The cost of these NASDs is
prohibitive for small businesses and we believe that we can develop an open
source file system that supports snapshots. Since LFS lends itself naturally
to support snapshots, we propose to implement an LFS for Linux.

Status

An experimental version of LFS can be download from the sourceforge
website. The code can also be obtained from the CVS, and instructions on
compiling and using LFS are available here.

In the current state, one can perform various normal file system operations
like mkdir, rmdir, link, unlink .... A working cleaner and basic
snapshotting framework is available as well.
The code compiles cleanly on 2.6.11 kernel and may or
may not compile on other 2.6 kernels. Contact me, if you are interested in
testing it.

Disclaimer: The file system is still experimental and may eat up your
disk/memory and/or lock up your machine. I am not responsible for any damage
you might incur. That said, it probably would only cause damage to the LFS
partition.

Mailing List

Subscribe
to the mailing
list, if you are interested in following LFS development. This is also the
right place for feature requests, bug reports etc.

People

Documents

FAQ

Have you checked other implemenations? Why are you reinventing the wheel?

Yes. The project takes its inspiration and data structures from the NetBSD LFS
implementation. There have been various attempts to implement a logfs for
Linux.

LinLogFS:
Originally developed for 2.2.x kernels as a
modification to ext2's lower layers. A lot has changed
since 2.2 (for example merging of buffer and page caches) and a
new file system that directly manipulates the buffer cache is required. The
original author lists various cool
additions to LFS including snapshots and mentioned

It's probably best to implement them from scratch (or starting with ext2 or so)
rather than trying to port LinLogFS forward to Linux 2.6 and then add these
ideas.

The project originally did not include a cleaner (see below).

LinLogFS
Cleaner: This is developed as a part of Master's thesis project by David Gatwood. The cleaner is pretty limited and I wanted a modularized
cleaner to implement new cleaning algorithms. Also, the code is not available
online, and my e-mails to David are unanswered.

The Swarm Scalable Storage
System: This project uses logfs concepts to implement a storage solution
for the cluster. A lot of interesting ideas are discussed in their paper, but
where is the code?

Neil Brown submitted a paper to LCA 2003 discussing various
aspects of developing a log structured file system. No code has been released
yet. I contacted him in 2004 and he mentioned that he is working on a
user-space prototype.

No current Linux file system supports snapshots and implementing a file system
that inherently supports a file system will be a great addition to Linux.

What are these snapshots and Why do I need them?

Some people call them versions, but I would like to call them snapshots as
they represent snapshots of a whole file system rather than a single file.
Netapp's WAFL file
system provides snapshots of file system over time. For example, if you
have accidentally deleted your home directory, you can just go to
.snapshot directory and you can see snapshots of the directory
from various points of time.
This is an invaluable feature as it provides backups within the file system
without wasting unnecessary space.