Lab 3 - A Simple File System

Due date of preliminary design: April 11 Due date of Stage 1 (disk library): April 11 Due date of Stage 2 (intermediate system): April 25 Due date of Stage 3 (final system): May 6 Due date of design paper: May 8

The goal of this project is to obtain experience with file systems. A file
system is a core component of most operating systems and the implementation of
this system will provide experience designing complex systems, one of the major
topics in 6.033. File systems, by their nature, incorporate aspects of naming,
fault-tolerance, concurrency control, and storage management. In addition,
issues of security and networking appear in more ambitious designs.

To help you in designing your file system, this project is organized
into a couple stages. In stage one, you are asked to build a disk
emulation library that will emulate a disk on top of a single file.
Its interface includes functions to read, write, and flush disk
blocks. In stage two, you will build a simple file system on top of
the disk interface you built in stage one. This file
system should provide efficient storage. In particular, access to file
blocks should be efficient and the amount of metadata needed to
implement the file system should be small. In stage three, you are
asked to make your file system robust in the face of failure (i.e., it
should not corrupt data if it crashes). To test robustness, we
will crash your file system at arbitrary points in its
code: your file system should be able to recover to an internally consistent
state irrespective of where these crashes occur.

The final result of your project is a complete document detailing the design
and implementation of your simple file system (SFS), the code that implements
the design, and a short evaluation of your file system in terms of performance
and functionality. This document can be used as the paper for the second
design project in 6.033. The paper is due on May 8.

Unlike the RPC project, you are strongly encouraged to execute this
project in teams of three. Find someone else to work with and send mail
to Constantine Sapuntzakis (csapuntz@lcs.mit.edu), telling him who your
teammate is. Make sure to do this right away. The first part of the project is
due very quickly (April 11).

The rest of this handout touches on some of the design and implementation
issues, and provides some suggestions on a possible approach. We suggest you
read the whole document before starting to work on Stage 1 so that you have a
good idea how the whole project fits together and can plan your time carefully.

A file system must be able to store its data in a persistent manner.
The persistent medium in your assignment will be disk. Your task is to
design a simple disk library that reads and writes 4-Kbyte disk blocks.
Your file system will be built on top of this interface.

Since we do not have a raw disk for each of you, your disk library will use a
single (large) Unix file to emulate a disk. Your disk library will divide
the file into 4-Kbyte blocks. Blocks will be written using a ``block
number.'' You will use this number to compute the offset into the Unix file.
For instance, disk block 10 will be at byte offset 10 * 4096 = 40960.

The disk library's interface (which we define) is similar to that of
raw hardware provides: as a result we should be able to
incorporate your file system code into a real OS with a minimal of
fuss. OS software is frequently developed at user-level using
emulation. It allows ease of debugging and use. Importantly, it also
removes the need for us to buy each student a disk.

openDisk opens a file ( filename) for reading and writing
and returns a descriptor used to access the disk.
The file size is fixed at nbytes. If filename does not
already exist, openDisk creates it.

readBlock reads disk block blocknr from the disk pointed
to by disk into a buffer pointed to by block.

writeBlock writes the data in block to disk block blocknr
pointed to by disk.

syncDisk forces all outstanding writes to disk.

Your task is to implement this interface. While this task should be
simple, you have only a single week to do it. The implementation will
require that you understand how to open, read, write, and seek files
under UNIX. Section 7.4 in Tanenbaum contains a short description of
the UNIX system calls. The ones on Athena are mostly identical but
verify their semantics in the man pages (``man 2 open'').

The due date of this part of the project is April 11. Since the rest of the
project builds forth on the disk library, make sure you make the deadline.
Send us a pathname to a directory in your Athena locker that contains the
source code (including Makefile) and documentation. Also, make sure you have
tested your disk library.

The next assignment is to implement a file system using your disk library.

There are three main challenges in implementing this local filesystem: (1) the
main-memory and on-disk data structures to map files names onto disk blocks and
to keep track of free blocks; (2) the disk layout; and (3) the recovery
strategy. To keep the task manageable, you may assume that there is a single
client using the file system; the single client may have multiple files open,
though. We discuss each challenge in turn.

This assignment revolves around three of the functions provided
by a file system:

Naming. Human-readable file names must be mapped to the
disk blocks that they are associated with.

Organization. File systems (typically) have mechanisms
humans can use to organize information. Computer scientists
like hierarchies, as a result, the tree directory structure
has become the dominant mechanism for organization.

Persistence. Data is entrusted to the file system and
is expected to remain there (uncorrupted and undeleted) until
an explicit request is made to delete it. This aspect includes
support for allocation and deletion of files; as syntactic sugar,
file systems also provide mechanisms to modify existing files.

Important issues that we will treat lightly include protection
(ensuring only authorized programs can read/write/delete/create
files) and performance (disks are slow: most research on file
systems concentrates solely on performance).

To make the assignment concrete, we discuss a common file system
organization below: you are encouraged to construct your own ideas, and
to simply use this discussion to understand the issues involved.

There are three main data structures file systems manage:

A disk block freelist. This is a simple data structure
(for instance, a bitmap) that tracks what blocks on the disk have been
allocated. This data structure is, for obvious reasons, persistent (i.e.,
on disk). You will have to establish some convention for locating it
across reboots.

Inodes. An inode is persistent memory that contains
pointers to disk blocks or to ``indirect blocks'' (you can think of
these as other inodes). Inodes are used to map a file to the
disk representation of a file: each file has its own inode, the inode
has pointers to all disk blocks that make up that file.

Directories. Directories are persistent mappings of names
to inode numbers for all the files and directories that the directory
is responsible for.

Given this class of objects, the following algorithms are used
to create, grow, and delete a file.

The actions to create file:

Allocate and initialize an inode.

Write the mapping between the inode and file name in the
file's directory (usually the current working directory).

Write this information to disk.

To grow a file:

Load the file's associated inode into memory.

Allocate disk blocks (mark them as allocated in your freelist).

Modify the file's inode to point to these blocks.

Write the data the user gives to these blocks.

Flush all modifications to disk.

To shrink a file:

Load the file's associated inode into memory.

Remove pointers to the disk blocks from the inode.

Mark the disk blocks as free.

Flush all modifications to disk.

Modifying directories is similar.

Note that we are careful about when modifications to file system state
are made persistent. For instance, in the case of creating a file, if
we flush modifications to disk after stage (1), a crash that happened
before (2) would cause us to have a disk block allocated that was not
used. If we did not flush modifications to disk at all, then a crash
would cause the file to be lost. You will have to pay particular
attention to this and similar problems in the third part of the
assignment.

While your file system does not have to survive random crashes at this
stage, it does
have to maintain state persistently. In particular, it should be able
to run, be powered down, and then be ``rebooted'' in the state that it
was in when it was shutdown (i.e., all files that were written at
shutdown time should still be there, and no garbage files should
suddenly appear).

In addition to correctness, you should consider space efficiency in this part
of the assignment: your file system should have a low overhead per byte
of file data.

We evaluate your implementation on how efficient it is, the storage overhead on
disk, and how well it recovers from failures. You do not have to implement
block caching. If you implement caching correctly, however, we will give you
bonus points.

How will you implement this file system? In many modern operating systems, the
file system is implemented in the kernel, which is difficult to modify
and debug (especially when you don't have the source code!). Another option
would be to have you implement a library of filesystem functions (similar idea
to the RPC assignment) and then implement some test applications against
that library. The advantage of this approach is that is significantly easier to
debug. The disadvantage is that you cannot run unmodified applications against
it. A third option, the one we chose, is to have you implement your file
system as an NFS server on your local machine. You then mount that NFS server
and the file system becomes part of your file system name space.

This approach requires you to understand some of intricacies of NFS. You
will find RFC 1094 (http://web.mit.edu/6.033/lab/handouts/rfc1094.txt)
useful.

The source code for the NFS server skeleton (without a complete filesystem)
will be posted in /mit/6.033/lab/lab2/src. It should run under NetBSD,
Linux, and Solaris.

A follow-on handout will talk about the NFS server issues in more detail.

The due date of this part of the project is April 25. Send us a pathname to a
directory in your Athena locker that contains the source code (including
Makefile) and documentation. Also, make sure you have tested your file system
extensively.

File systems are different from most other parts of the computer system
in that they are persistent: modifications to disk survive power
cycles. In contrast, most other state (e.g., main memory, registers,
TLB's, and, as a result, data structures) is not persistent. Much
reasoning about robustness in a file system stem from the problem of
moving state from the non-persistent realm to the persistent realm in a
series of discrete steps in such a way that a failure can occur at any
point in time and still leave the system in a usable state. For instance, SFS
will commonly write data to a file. This will involve a number of
steps: allocating space for the file on disk, modifying the file's
inode, modifying the disk block freelist, and finally, writing the data
to memory. SFS must guarantee that a failure after any of these will
not corrupt the system (e.g., will not lose disk blocks, delete existing
disk blocks, lose inodes, etc.). This struggle involves mechanical
operations (such as writing transaction software by hand) and semantic
struggles (such as defining exactly what ``usable'' means). In talking
about failure, more so than other domains, precise definitions are not
semantic quibbles.

The first part of this assignment will involve defining the
semantics of failure: at what point is the modified state of a file made
persistent? For instance, if I write a single byte to disk, am I
guaranteed that when the library call returns the byte has been written?
Or do I have to perform a sync by hand? If I delete a file and the
system crashes, will the file remain deleted? You should define a
non-trivial (i.e., useful) recovery model that balances the utility
of consistent state on disk with the expense of synchronous disk
updates.

Operationally, this part of the assignment will involve working through
your code so that a failure at any point in time will not corrupt the
system. You may require that a program be run at boot time to
reconstruct parts of SFS, but it should always be able to be
recovered to a usable state.

Failures to guard against:

Use of allocated disk blocks. Blocks may have been
allocated to a particular file but the freelist was only
modified in memory, not on disk.

Loss of disk blocks. Blocks may have been removed
from the freelist but not yet had an inode persistently
modified to point to them.

Loss of data. Data that has been written to disk
should stay there.

There are, of course, many other failures possible. This list
should give their flavor.

We will test this part of the assignment by crashing your program at
random points in time and then rerunning it (repeatedly).

The due date of this part of the project is May 6. Send us a pathname to a
directory in your Athena locker that contains the source code (including
Makefile) and documentation. In this part, we will be crashing your file system
and check whether it recovers correctly.

The due date is set on May 6 so that your design paper and your software are
not due on the same date.