Disk Space Case Study - Apr 2004

As of April 2004, we have begun the process of purchasing additional
space for user projects. The following explains what we were looking for
and how we made the decisions that were required. (Note that the final
decisions have not yet been made.)

Disks

For optimal performance and reliability, we use U160 10k rpm (or
better) SCSI disks for all of our shared drives. We also go for the
largest available drives at the rack-mount density, which is at this time
146GB per drive. Seagate drives are preferred, based on our past
experience with their reliability and performance.

Refurbished drives were considered and rejected for reliability
reasons.

The disks are connected to the file server with a SCSI JBOD device.
We have historically used the Adaptec DuraStor 412R, but these are no
longer sold by Adaptec; they now use the Adaptec SC4100 instead. These
units use proprietary drive rails which are sold only with the disks
themselves; this was worrisome, but the prices available on the drives was
low enough to give us the willingness to purchase them anyway. Each of
these units requires two SCSI channels for best performance.

Backups

Backing up the data to ensure data reliability from both disk failure
and human error is the most expensive part of running a disk system, both
in initial investment and in ongoing maintenance. We have strived to
minimize both.

1/3 of all disk space goes to backing up the other 2/3rds; this allows
us to run daily incremental backups to disks, allowing for extremely fast
and convenient data restores 90+% of the time. These backups should be
spread across multiple systems, so that both disks and backups are less
likely to fail simultaneously.

SDLT tapes are used for both monthly full backups and twice-annual
archivals. Each SDLT320 tape can hold 160GB of uncompressed data (most of
our data does not compress well); that translates to roughly one tape per
backed-up disk. To do this efficiently requires an SDLT changer with at
least one slot per backed up drive; the Sun StorEdge L8 matches this
well. This unit requires a single SCSI channel.

We store the last six months worth of monthly backups, and the last
two years worth of archival backups. This requires 10 tapes per backed up
drive.

File Servers

We use Sun NFS servers to share the disks to our local network; they
offer superb reliability and maintainability at a reasonable cost. The
SunFire V210 server, currently available at a 50% discount from Sun due to
their Matching
Grant program, is a good server at an excellent price; it offers dual
processors, gigabit networking, 2 gigabytes of RAM, a PCI slot to connect
a dual-channel SCSI card for the JBOD, and an additional SCSI slot to
connect the tape changer.

(We actually plan to use these servers to replace our existing mail
and web server, which are currently running on SunFire 280Rs. These
servers are better suited for file service, primarily due to the ability
to add significantly more SCSI cards, but are also much more expensive.)

Redundancy

In order to ensure that we can continue running our network even as
hardware fails, there needs to be sufficient capacity remaining in the
system to take over the components of a failed system if one should die.
We normally achieve this through redundancy - we always purchase at least
two identical servers at a time, with enough extra SCSI ports to
temporarily take over another server's disks if the server fails. We also
have extra single-drive SDLT drives in case one of the changers fails.

The SCSI JBODs are a special case. We keep the equivalent of one JBOD
of each model empty, to ensure that if one fails we can transfer the
drives to the other JBODs. This greatly encourages the use of a single
model of JBOD.

In the case of disk failure, the data can be restored to our existing
scratch space.