1. Introduction

Size-changing algorithms (SCAs) are those that take as input a stream
of data bits and produce output of a different number of bits. These SCAs
share one quality in common: they are generally intended to work on whole
streams of input data, from the beginning to the end of the stream. Some of
the applications of such algorithms fall into several possible categories:

Compression:

Algorithms that reduce the overall data size to save on
storage space or transmission bandwidths.

Encoding:

Algorithms that encode the data such that it has a better
chance of being transferred, often via email, to its intended recipients.
For example, Uuencode is an algorithm that uses only the simplest printable
ASCII characters and no more than 72 characters per line.
In this category we also consider transformations to support
internationalization of text as well as unicoding.

Encryption:

These are algorithms that transform the data so it is
more difficult to decode it without an authorization--a decryption key.
Encryption algorithms can work in various modes, some of which change the
data size while some modes do not [23]. Typically,
encryption modes that increase data size are also more secure.

There are many useful user-level tools that use SCAs, such as uuencode, compress, and pgp. These tools work on whole files
and are often used manually by users. This poses additional inconvenience
to users. When you encrypt or decompress a data file, even if you wish to
access just a small part of that file, you still have to encode or decode
all of it until you reach the portion of interest--an action that consumes
many resources. SCAs do not provide information that can help to decode or
encode only the portion of interest. Furthermore, running user-level SCA
tools repeatedly costs in additional overhead as data must be copied between
the user process and the kernel several times. User-level SCA tools are
therefore neither transparent to users nor do they perform well.

Instead, it would be useful for a file system to support SCAs. File systems
are (1) transparent to users since they do not have to run special tools to
use files, and (2) perform well since they often run in the kernel. File
systems have proven to be a useful abstraction for extending system
functionality. Several SCAs (often compression) have been implemented
as extensions to existing disk-based file systems
[2,3,18]. Their disadvantages are that
they only work with specific operating systems and file systems, and they
only support those specific SCAs. Supporting general-purpose SCAs on a wide
range of platforms was not possible.

Stackable file systems are an
effective infrastructure for creating new file system functionality with
minimal performance overhead and development cost
[10,12,22,24,28,29,25].
Stackable file systems can be developed independently and then stacked on top
of each other to provide new functionality. Also, they are more portable
and are easier to develop [29]. For example, an
encryption file system can be mounted on top of a native file system to
provide secure and transparent data storage [27].
Unfortunately, general-purpose SCAs have never been implemented in stackable
file systems. The problem we set out to solve was how to support
general-purpose SCAs in a way that is easy to use, performs well, and is
available for many file systems.

We propose fast indexing as a solution for supporting SCAs in
stackable file systems. Fast indexing provide a way to map file offsets
between upper and lower layers in stackable file systems. Since the fast
indexing is just a mapping, a lower-layer file system does not have to know
anything about the details of the SCA used by an upper-level file system.
We store this fast indexing information in index files. Each encoded
file has a corresponding index file which is simply stored in a separate
file in the lower-layer file system. The index file is much smaller than
the original data file, resulting in negligible storage requirements. The
index file is designed to be recoverable if it is somehow lost so that it
does not compromise the reliability of the file system. Finally, fast
indexing is designed to deliver good file system performance with low
stacking overhead, especially for common file operations.

We have implemented fast indexing using stackable templates
[28,29,25]. This allows us
to provide transparent support for SCAs in a portable way. To demonstrate
the effectiveness of our approach, we built and tested several
size-changing file systems, including a compression file system. Our
performance results show (1) that fast index files have low overhead for
typical file system workloads, only 2.3% over other null-layer stackable
file systems, and (2) that such file systems can deliver as much as five
times better performance than user-level SCA applications.

This paper describes fast index files and is organized as follows. Section
2 reviews the stacking file-system infrastructure used for this
work and discusses related work in SCA support in file systems. Section
3 details the design of the index file. Section
4 describes the usage of the index file in relation to common
file operations and discusses several optimizations. Section
5 details our design for a consistent and recoverable
index file. Section 6 summarizes important implementation
issues. Section 7 describes the file systems we built using
this work and evaluates our system. Finally, we present conclusions and
discuss directions for future work.