Build Your Own RAID Storage Server with Linux

The potent combination of Linux RAID, SATA and LVM can provide you with a powerful and inexpensive storage server. (First of two parts)

If you've been thinking of building yourself a dedicated storage server, this is a
good time to do it. Prices are so low now that even a small home network can have a
dedicated storage and backup server for not much money. SATA hard drives have large
capacities and high speeds for low prices, and you don't need the latest greatest
quad-core processor or trainloads of RAM. The ultimate in flexibility and reliability
combines Linux software RAID (Redundant Array of Inexpensive
Disks) and LVM (Linux Volume Manager).

These are good times for hardware geeks of all kinds: prices are low and features
abundant. Most motherboards include a feast of onboard controllers that used to require
separate expansion cards: audio, video, RAID, Firewire, and Ethernet. Laptops and
monitors come with integrated microphones and cameras. Hordes of USB 2.0 ports means
easy connectivity for peripherals. Gigabit Ethernet? They're practically giving it away.

Data Storage and Retrieval

The best part is it's easier than ever to store, backup and retrieve your data. My
personal favorite use for USB is connecting external re-writable storage devices;
everything from little thumb drives to big hard drives. These are absolutely great for
inexpensive backups and file transfers, and even my most relentlessly techno-gumby
friends and relatives can copy files to a USB stick. I suffered during those awkward
transition years when 3.5" diskettes were too small and there was nothing comparable to
replace them. Zip drives were too unreliable, and non-standard  can you read those
disks now? CDRWs were funky  sometimes you could read them, sometimes not, and
packet-writing never did work reliably on any platform. DVD-RWs offered bigger
capacities, but hard drives still outstripped them. Plus there were (and still are) too
many competing DVD standards, and just like their CDRW cousins they are not reliable
enough.

My favorite solution for large-capacity backups and storage is hard drives. Yes, I
know that tape storage rivals hard disks for storage capacity, but I don't like it. It's
cumbersome, expensive, and non-portable. Hard drives are fast, easy, inexpensive, and
best of all very portable. They are readable without any special software or
hardware; just stuff a drive into any PC, or in an external USB or Firewire enclosure
attached to a PC. It doesn't even have to be a Linux PC as long as you have a Linux
LiveCD or USB stick. You'll be able to read nearly any filesystem, and Linux offers a
number of good data-recovery utilities if you need them.

But as excellent as all of these are, there comes a time when they're not quite
adequate, and that's when a dedicated storage server is the right tool for the job.

Why Use RAID?

One word: uptime. RAID protects you from drive failures. When a drive fails and
it's always "when", not "if" the remaining disks carry on until you replace the
dead disk. But do not expect RAID to replace regular backups, because it doesn't. There
are many things that can wipe out a RAID array: power surges, multiple drive failures,
undiscovered drive failures, theft, and disk controller failures are just a few
examples. You can't read individual disks from a RAID array, except for RAID 1, so you
have to rebuild the array to access your data. If too many drives fail, you won't be
able to recover anything.

A RAID array, no matter how many disks are in it, looks like a single logical
storage drive to your system. There are several different basic levels of RAID, from
RAID 0 to RAID 6. They use mirroring, striping, or parity, and various combinations of
these. These are the three that are most commonly used:

RAID 0

A striped set with no error-checking.
Striping means data are split equally across all disks in the array. It requires a
minimum of two disks. It's fast and increases your total available storage capacity,
combining all the drives in the array into a single storage unit, but it's also as
fragile as relying on a single hard drive if any one disk fails, the whole array
is lost. It's not really RAID because it's not redundant, and you definitely don't want
to use it in any mission-critical applications that require high uptimes. It's good for
I/O intensive jobs like video production, because you get a large storage volume and the
combined bandwidth of all the drives, up to the limitations of the RAID controller.

RAID 1

RAID 1 is mirroring. You need at
least two disks, and each one is an exact copy of the other. If one disk fails you don't
lose a thing, and there aren't any fancy striping or parity schemes to go haywire. I've
used it successfully on client installations for that bit of extra redundancy when
they're careless or even interfering with a proper backup setup. ($Deity save us all
from Knowitall Managers and their "Talented" Teen-age Nephews.)

RAID 5

This is my favorite general-purpose RAID. It
uses both parity and block-level striping across at least three drives. Parity means you
get data redundancy via some fancy on-the-fly calculations, and spreading it across all
the disks in the array means you can lose one and still rebuild your array. There is
some overhead for the extra storage equivalent to one disk divided by the number of
disks. So if you have a 3-disk array, 33 percent of your total storage volume is
dedicated to parity. On a 4-disk array it's 25 percent, and so on. Reads are fast, but
writes are slowed down by the parity calculations.

There are other basic RAID levels, and combinations of the various basic levels, and
Google is full of information on those. We're going to stick with the basics here.

Software RAID vs. Hardware RAID Smackdown

Ever since the vi vs. Emacs wars died of boredom it's been difficult to find good
flamefests. Even software RAID vs. hardware RAID has become mundane. But it's worth
reviewing the merits of each, because this isn't a case of one being clearly superior
over the other, but deciding which one meets your needs best.

I wouldn't even bother with a PATA RAID controller; they're more
trouble than help. SATA is where it's at these days. First the advantages of a
good-quality SATA hardware controller:

Offloads all the processing from the CPU

Add more disks than your
motherboard allows

No booting drama

The two disadvantages of good RAID controllers are cost and inflexibility. 3Ware
controllers are first-rate, but not cheap. Hardware controllers are picky about what
hard disks you can use, and the entire disk must belong to the array, unlike Linux
software RAID which lets you select individual disk partitions. Recovery from a
controller failure means you need the exactly correct new controller. Some admins think
that using a hardware controller is riskier because it adds a point of failure.

Poor-quality hardware RAID controllers are legion. Those onboard RAID controllers
and low-end PCI controllers aren't really
hardware controllers at all; they do all their work in (usually crappy) software.

Linux software RAID has these advantages:

Cost- free!- and
these days CPU cycles cost a lot less than good hardware RAID controllers

Very
flexible; mix-and-match PATA and SATA, individual partitions

More recovery
options: any Linux PC can rebuild an array

If I were running a super-important mission-critical server that had to be up all the
time and no excuses, I'd use SCSI drives and controllers. For everything else, Linux
RAID + SATA + LVM. Why do we want LVM? So we can resize our storage volumes painlessly.
Come back next week to commence construction.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.