IDE
RAID Technology

Abstract

This paper documents an investigation of a new disk
storage array strategy that offers an attractive
price/storage point as compared to current commercial SAN
or NAS products. The discussion includes a general
description of the technology, practical details of
implementation and an itemized price list.

Introduction

One of the salient features of a great deal of
contemporary scientific research is the reliance on
significant amounts of computing power. The data that
results from these computations must be stored for
analysis and reference in later research. It is not
surprising, then, that new terms like "disk
farm" and "terabyte" are being added to
the technical jargon of the day.

Current commercial answers to this vexing problem come
in a variety of flavors, but share the general feature
that significant amounts of storage come at significant
prices. Because the definition of "significant"
is at the fulcrum of the topic under consideration, I
propose that one terabyte of disk space (1,000 gigabytes
or one million megabytes) and $10,000 are both
significant values for individual research projects. When
changing the definition of significant (say one petabyte
of storage or $1,000 investment), the options available
for investigation might certainly be expected to change
as well.

Unfortunately the terabyte/10k price point is not a
hotly contested slot in the current commercial market.
Some of the national supercomputer centers1 and
other research groups2
have published results of their attempts to provide some
solution near that price point, and those investigations
lent impetus to a similar exercise here at UNT. The
results presented here indicate the terabyte/10k price
point is not wishful thinking, but there are also
compromises that should be understood before building
your own.

RAID Arrays

Many computer users already have experience using disk
arrays, often referred to as RAIDs (an acronym for
Redundant Arrays of Inexpensive Disks). The RAID concept
was a reaction to the problems and expense involved with
building continually larger (in terms of storage
capacity) disk devices. If one could make several smaller
devices appear functionally as a single disk drive, the
problems of bit density and physical size could be neatly
side stepped. This nut was cracked, and there are many
varieties of RAID controllers on the market now,
providing both increased reliability and performance.

RAIDs can be configured in a variety of ways which
highlight the required compromise between data integrity
and storage size. The trade-off is generally to increase
data integrity by writing the data in more than one place
(data redundancy) which in turn reduces the amount of
storage available by a factor of the redundancy. The most
obvious version of this trade-off is between RAID0 and
RAID1. In RAID0, multiple disks are interleaved to appear
as one disk which is the size of all disks combined
(called striping); in RAID1, half the disks are used for
data redundancy and are simply copies of the other half
(called disk mirroring). RAID0 maximizes the amount of
space available, but at the cost of reliability (the
probability of disk failure increases with the number of
disk drives, and any single drive failing breaks the
entire array); RAID1 increases reliability (two disks
would have to fail to loose data), but at the cost of
total available storage which is cut in half. Some
attempts have been made to provide a compromise between
the two extremes with features of both. RAID10 is the
combination of RAID0 and RAID1 where a set of striped
disks are mirrored. This provides the redundancy of
mirroring with the large file system sizes available to
RAID0 configurations. Another approach to a compromise
between size and redundancy is RAID5. In a RAID5 disk
array, one drive can be considered logically as a parity
drive (parity is a methodology for reconstructing data).
When a single disk fails, the data can still be
reconstructed from the parity information. This provides
redundancy at a lower storage cost than mirroring, but
has slower performance due to the nature of constructing
and reconstructing the parity information.

Up to this point, RAID drive arrays have primarily
been specialty products aimed at data centers where big
systems (and big budgets) are the rule. One cost factor
involved with these RAID products has been the type of
disk drive supported. Due to design limitations, the IDE
disk drives commonly used in desktop computers were not
useful for RAID configurations, so more expensive
SCSI-interfaced hard disks were and are the norm. This is
unfortunate from a cost perspective because the economy
of scale involved in the huge desktop computer market
constantly drives down the cost of IDE drives while the
competition for a share in that market has continually
increased the performance.

Several companies have recently introduced IDE RAID
controllers (sometimes called storage switches) which are
designed to use the newer ATA100 and ATA133 specification
IDE drives. This approach offers considerable savings due
to the lower cost of the hardware involved and the
necessity of establishing a new price point in order to
compete with established vendors already controlling
NAS/SAN market share. Nonetheless, entire IDE-based RAID
disk systems are already appearing in commercial form, so
you may be able to buy something similar to the system
described in this paper by the time you read this, thus
avoiding the hassles (and savings) of building your own.
Even if you decide to buy a complete system from a
commercial integrator, these experiences might be useful
in helping you understand the compromises you must make.

Components

In order to employ one of the new IDE RAID
controllers, a server class computer system will be
needed. The controller selected for this project is a
64-bit PCI card, so nothing exotic is required; a system
was specified which is generally representative of
current3
high-end desktop technology. The exceptional part of the
specification is that a case/power supply combination
must be selected which can meet the requirement of
running at least twelve disk drives. These drives
generate heat and can be expected to generate a
substantial power surge when the system is first turned
on. To reduce costs further, existing hardware could be
used or other components selected, but the ability to
power and cool a dozen disk drives is not a minor issue
and should not be discounted.

Items on the original system purchase order are shown
in the following table. These were reasonable prices at
the time of order, but will almost assuredly have changed
in price since this investigation:

Description

Qty

Unit price

Subtotal

ASUS A7M266 mbd

1

169.00

169.00

AMD Athlon 1.4g cpu

1

189.99

189.00

Thermaltake cpu cooler

1

22.99

22.99

256 mbyte DDR RAM

2

99.99

199.98

Maxtor 100g ATA-100 disk drives

12

261.00

3,132.00

ATI Radeon DDR 32meg video card

1

159.99

159.99

Antec SX1480B case w/ dual p/s

1

667.00

667.00

Some required items were not on this PO because they
were available locally (CD-ROM drive, network card, video
monitor, mouse and keyboard). The video subsystem is not
particularly important on a file server, so this is a
good area to look for savings. The components actually
delivered substituted an 80g Maxtor drive due to delivery
problems with the higher density drive and the desire not
to pend the project on delivery of that product (and at a
price reduction to $229.99 each). With the 20% decrease
in drive storage, the terabyte target slips to 800
gigabytes for this implementation, but the 100g drives
seem to be available now (and possibly larger ATA-133
drives). The final component necessary for this
investigation is the IDE RAID controller. The model
selected was a product from the 3ware corporation named
the Escalade Storage Switch model 7810. There are several
others, and this study is not an endorsement of the 3ware
product over any other.

This particular controller supports up to eight drives
(which is why two were ordered) in various RAID
configurations. Support is provided for IDE drives that
meet the ATA/100 specification (or lower). The company
also claims "hot swap" and "hot
spare" capabilities, but these features have not
been tested in this implementation. The controllers were
purchased for a unit price of $385.00, and were the
component (there's always one) which was the slowest
piece to arrive, delaying the project by several weeks.

The system was constructed while awaiting delivery of
the RAID controllers using one of the drives for system
software with the motherboard's IDE controller. The
software installed was the RedHat Linux 7.2 distribution
which comes standard with 3ware drivers. The primary
installation issue involved the physical layout of the
case selected. In order to make the IDE disk cables reach
from the controller to the disk drive bay, it was
necessary to machine two "windows" in the
slide-out tray that holds the motherboard for cable
routing. This modification was a fairly trivial machining
job if you have a machine shop, but cable routing should
be carefully planned in advance if you don't. Most of
these cases will be designed with SCSI cabling in mind,
which means one or two fairly long cables with multiple
connectors; for this controller, each disk drive as a
separate 80-pin ribbon cable (supplied with the
controller) which requires a little more forethought. In
addition, the controller comes with a set of cables which
"Y" the power connectors, but your case
selection should provide enough power leads as an
indication the manufacturer designed the system to
support several drives. When the controllers arrived, one
was placed in the system and connected to eight drives.
The instructions provided with the controller are
minimal, but there was no difficulty in putting the
controller and drives into service.

The total cost of the final configuration as
implemented, including shipping and handling charges, was
$4998.83. This does not include keyboard,mouse, NIC,
video monitor or miscellaneous hardware (disk drives do
not come with mounting screws etc). The potential data
storage capacity is 880g (one of the twelve drives is
used for system software and therefore not placed on the
RAID controller) for a price very close to $5,000.

Implementation

The primary thrust of this implementation was
low-cost, network accessible disk storage. To meet that
system criterion, it was only necessary to see if file
systems could be built on the array and then exported to
the network. The initial tests were done with a
configuration using one controller and eight disk drives.
The drives are represented to the user as a single SCSI
disk (ie the 3ware controller looks like a SCSI
controller to Linux). In the first case, the entire
capacity of all eight drives was used to build a single
file system. From the user perspective, it looks like:

Filesystem
1k-blocks Used
Available Use% Mounted on

/dev/hda1
132207
86550 38831 70% /

/dev/hda10
71837756 1379752
66808844 3% /export

/dev/hda6
2016016 36268
1877336 2% /home

none
256408
0 256408 0% /dev/shm

/dev/hda9
194443
24 184380 1% /tmp

/dev/hda5
2016016 890712
1022892 47% /usr

/dev/hda8
194443
17522 166882 10% /var

/dev/sda1
615383612
24 584123912 1% /3W

/dev/cdrom
9158
9158
0 100% /mnt/cdrom

Notice the file system named 3W which shows up as
device /dev/sda1 contains 615 gigabytes (the reported
sizes are in one kilobyte blocks). The eight 80g drives
have a maximum advertised storage of 640g, but the file
system requires some of that storage (about four percent
in this case) for metadata. Under the Available column,
this file system has only 584g showing. That is the
result of saving back five percent of the space in the
file system which only root can utilize (this is a
standard practice which can be modified with special
directives). From the user perspective then, we have a
file system with just over half a terabyte, however,
notice that this is about 91% of the theoretical maximum
storage available for this configuration (one controller,
eight drives, 640g). In tests with different types of
RAID, redundant storage will make 90% utilization of
advertised space look very attractive indeed.

The file system is now operational, but not very
useful because there is only local access. This file
system can be exported for network access using NFS
(Network File System). The following screen capture shows
the mount command used to make the new file system
available from a different machine by using the path /3W
(in other words, the same path but on a different
computer):

[root@jonvon /]# mount -t nfs
tbyte:/3W /3W

[root@jonvon /]# df -k

Filesystem
1k-blocks Used
Available Use% Mounted on

/dev/sda1
132207
57031 68350 46% /

/dev/sdb3
6467288 3736116 2402644
61% /bak

/dev/sda10
3921404 1346944 2375256
37% /export

/dev/sda6
2016016 392140
1521464 21% /home

/dev/sdb2
2016044 392048
1521584 21% /home_bak

/dev/sda8
194443
134 184270 1% /tmp

/dev/sda5
2016016 1424616
488988 75% /usr

/dev/sda9
194443
23617 160787 13% /var

tbyte:/3W
615383616
24 584123912 1% /3W

Comparing the two views of the file system will show
that only the device name has changed with the NFS mount
showing the hostname from which the file system was
shared. The configuration which defines which file
systems are exported (that is capable of being shared)
also controls where they will be available (allows access
control by host). In other words, the file system may be
shared with only specific hosts, or entire subnets, or
any system on the network capable of performing an NFS
mount. It is important to remember when using NFS that
file sharing in this fashion presumes a common user base
(ie the user named linus on the file server will have the
same user ID on all hosts that share the server's
exported file systems).

The exercise was to provide large amounts of
inexpensive disk space which was accessible over the
network. To this point a file system of 584 usable
gigabytes has been provided for a cost of $5,000, or
about 116 megabytes per dollar (remember this cost
includes the computer and all hardware necessary to use
the disks). Having established a "beachhead" on
the storage/price criterion, some benchmarks are in order
to characterize the performance of the result.

Benchmarks

The benchmark numbers provided in this section are
intended to be neither authoritative nor exhaustive. In
order to provide more reproducible results, the tests
would need to be run over a private network in a much
more controlled environment. On the other hand, it is
often quite difficult to reproduce the performance
documented in this type of benchmark testing outside a
controlled environment. These tests were run over the UNT
Academic Computing Services subnet during "off"
hours (which means early morning here). In addition to
the described system, some commercial alternatives were
tested and the numbers provided for perspective. The
environment was not equal for all tests simply
because some of the systems are not available on the
public network. This additional information is still
useful to get the "flavor" of the compromises
required.

To test disk performance, the publicly available
bonnie benchmark was used. This program tests sequential
read/write performance in both character and block mode,
and also provides a random seek test. The file size used
for these tests was 100 megabytes with results averaged
over five runs as kilobytes/second. These test were run
on the standard Linux file system type (ext2)4. The
following table summarizes the test runs on tbyte5:

Type

char in

blk in

char out

blk out

IDE

14,465

325,499

15,236

285,120

RAID0

9,540

330,983

9,442

250,264

RAID5

9,182

331,235

9,428

260,339

RAID5-NFS

6,050

10,710

5,384

229,958

RAID10

9,555

336,194

9,475

276,232

RAID10-NFS

6,001

224,985

5,309

10,677

The test labeled IDE was run to the IDE disk attached
to the standard IDE controller providing information
about the performance capabilities of a single drive; the
remainder were run on groups of disks attached to the
RAID controller. The RAID0 configuration was a single
stripe over eight disks with RAID5 also using a full
eight disk array. The RAID10 and RAID10-NFS tests were
both run with an array configuration using four drives (a
two drive stripe which is mirrored). The RAID10-NFS test
was run on a remote host using the RAID10 array over an
NFS mount point. The two NFS tests are probably the most
useful in terms of predicting performance because they
were run over the UNT network. Both of the NFS tests
exhibit the performance decrease characteristic of
network file accesses. In addition, the RAID5-NFS figures
show the combined effects of network access and the
read-before-write design (two network accesses) typical
of RAID5 configurations. The trade-off between RAID1 (or
RAID10) and RAID5 is more obvious when the data size
available to users is considered for eight disk arrays:
511g for RAID5 and 292g for RAID10. In other words, for
redundant array types, the RAID5 will provide more
storage6
at lower performance.

In order to get the flavor of the performance
compromise between this approach and some commercial
products, similar tests were run on two different
"disk farms" employed by ACS in production
mode. This is not an apples-apples comparison for several
reasons. First, the network over which these tests were
run is not the campus network (this should affect only
the NFS test). In the case of the MetaStor NAS, the file
system involved was a proprietary Veritas file system
using RAID5. The Nstore is a SAN, so the
"network" attachment is a fiber technology SCSI
network and not the common campus network. Finally, the
cost of the Metastor with 375g was about $130,000 while
the Nstore with 500g was about $50,000 with an additional
$2k/node charge for the fiber interface card..

Type

char in

blk in

char out

blk out

MetaStor

11,502

116,503

13,146

48,269

MetaStor-NFS

10,898

465,338

6,999

6,649

Nstore

13,360

509,140

13,262

161,215

Both of these systems provided generally improved
performance on these tests, but at significantly
increased price. Part of the compromise here is due to
the projected use of the storage. With the MetaStor, the
physical configuration will support many more disk drives
than are currently employed (growth capacity). While the
cost of these drives will be high, the hardware described
in this project is simply not capable of the storage
capacities which are supported by the MetaStor. In
addition, some of the costs involved in the MetaStor are
for large buffers which will sustain these performance
numbers under much greater load. The Nstore SAN is is
high-performance technology most comparable to a local
disk drive because the data transport is not over a
general purpose network. The performance comes at the
cost of specialized interface cards and fiber cable
connects which are an additional expense to the normal
NIC already part of the system.

Summary

The search for storage space will become an
increasingly common venture as more data is churned out
by computational research in every field. In some cases
where the amount of space is prioritized over performance
concerns, the IDE RAID technology can provide impressive
amounts of storage per dollar and, perhaps more
importantly, is within reach of modest departmental or
even project level budgets. The implementation is not
particularly difficult if local UNIX competence is
available, but could be an issue if that is not the case.
This technology is new enough that long-term stability
should be a concern, though there was no sign of any
instability in the process of running these tests. In
addition, NFS is not an appropriate solution in all
contexts. If, however, the requirements can be met by
this approach, a very impressive price/storage point is
possible with an IDE RAID system.

The decision about which type of technology to employ
should not be generalized into a "one size fits
all" methodology. The data collected in this project
argues that at the current time there are several viable
approaches, each with its own set of compromises. The
premise of deploying a system with "room to
grow" should be greeted with special skepticism
because the "upfront" costs of this growth
potential are high, while the rate of technological
change can easily devalue the usefulness of growing a
two-year old technology.

About this document ...

IDE RAID Technology

This document was generated using the LaTeX2HTML
translator Version 2K.1beta (1.47)