Selecting Filesystems

Now that the devices are created, the next step is to put filesystems on them. However, there are many types of filesystems. How do you choose?

For typical desktop filesystems, you're probably familiar with ext2 and ext3. ext2 was the standard, reliable workhorse for Linux systems in years past. ext3 is an upgrade for ext2 that provides journaling, a mechanism to speed up filesystem checks after a crash. ext3's balance of performance, robustness, and recovery speed makes it a fine choice for general purpose use. Because ext2 and ext3 have been the defaults for such a long time, ext3 is also a good choice if you want great reliability. For storing backups, reliability is much more important than speed. The major downside to ext2/ext3 is that to grow (or shrink) the filesystem, you must first unmount it.

However, other filesystems provide advantages in certain situations, such as large file sizes, large quantities of files, or on-the-fly filesystem growth. Because LVM's primary use is for scenarios where you need extreme numbers of files, extremely large files, and/or the need to resize your filesystems, the following filesystems are well worth considering.

For large numbers of small files, ReiserFS is an excellent choice. For raw, uncached file I/O, it ranks at the top of most benchmarks, and can be as much as an order of magnitude faster than ext3. Historically, however, it has not proven as robust as ext3. It's been tested enough lately that this may no longer be a significant issue, but keep it in mind.

If you are designing a file server that will contain large files, such as video files recorded by MythTV, then delete speed could be a priority. With ext3 or ReiserFS, your deletes may take several seconds to complete as the filesystem works to mark all of the freed data blocks. If your system is recording or processing video at the same time, this delay could cause dropped frames or other glitches. JFS and XFS are better choices in this situation, although XFS has the edge due to greater reliability and better general performance.

With all these considerations in mind, format the partitions as follows:

Adding Reliability With RAID

So far, this LVM example has been reasonably straightforward. However, it has one major flaw: if any of your drives fail, all of your data is at risk! Half a terabyte is not an insignificant amount to back up, so this is an extremely serious weakness in the design.

To compensate for this risk, build redundancy into the design using RAID 1. RAID, which stands for Redundant Array of Independent Disks, is a low-level technology for combining disks together in various ways, called RAID levels. The RAID 1 design mirrors data across two (or more) disks. In addition to doubling the reliability, RAID 1 adds performance benefits for reads because both drives have the same data, and read operations can be split between them.

Unfortunately, these benefits do not come without a critical cost: the storage size is cut in half. The good news is that half a terabyte is still enough for the present space requirements, and LVM gives the flexibility to add more or larger disks later.

With four drives, RAID 5 is another option. It restores some of the disk space but adds even more complexity. Also, it performs well with reads but poorly with writes. Because hard drives are reasonably cheap, RAID 5's benefits aren't worth the trouble for this example.

Although it would have made more sense to start with a RAID, we waited until now to introduce them so we could demonstrate how to migrate from raw disks to RAID disks without needing to unmount any of the filesystems.

In the end, this design will combine the four drives into two RAID 1 pairs: /dev/sda + /dev/sdd and /dev/sdb + /dev/sdc. The reason for this particular arrangement is that sda and sdd are the primary and secondary drives on separate controllers; this way, if a controller were to die, you could still access the two drives on the alternate controller. When the primary/secondary pairs are used, the relative access speeds are balanced so neither RAID array is slower than the other. There may also be a performance benefit to having accesses evenly distributed across both controllers.

First, pull two of the SATA drives (sdb and sdd) out of the datavg VG: