Behind The Tech: RAID

Photographer Vincent LaForet edits at his studio workstation, where he stores his files on RAID drives.

Whether you’re relatively new or experienced in the world of digital photography, ever-increasing file sizes and ever-speedier camera frame rates continue to make larger and larger demands on your digital storage and backup solutions. While storage options and formats continue to proliferate, traditional spinning hard drives remain the central player in most photographers’ arsenals. However, for all hard drives’ cost-to-capacity benefits, they have three main limitations: a fixed capacity, a fixed (and often limited) throughput, and a propensity for hardware or software failure somewhere down the road. That’s right, photographers: All hard drives, given enough use, will eventually fail.

Here’s where RAID, or Redundant Array of Independent Disks, provides a valuable solution to some or all of these limitations, depending on configuration. RAID describes a schema, or data management strategy, implemented by a hardware or software RAID Controller and some combination of two or more physical hard drives configured in an “array.” In most implementations, the hard drive space is partitioned or divided into “stripes,” functional blocks of data anywhere from 512K to a few MBs in size, that can be independently managed as a subset of your images, videos or other individual data files. The configured RAID array of multiple drives always appears to your operating system as a single volume, the RAID Controller intelligently managing the data and drives in an invisible, seamless manner for the user.

There are quite a few standard RAID strategies on the market, but we’ll discuss the three basic schemas that apply to most photographers and most consumer-level devices. All RAID schemas involve some level of either striping, dividing data across multiple drives, mirroring, making redundant copies of every file to two or more drives, or some combination of both to the greater benefit of the user.

RAID 0, or striping, is the process of dividing your data over multiple drives, so that larger files are broken up in to smaller chunks that individual drives can read and write more quickly. You’ll often hear drive manufacturers touting the latest speeds of modern connections like USB3 or Thunderbolt2; however, the limiter often remains a traditional spinning hard drive’s throughput of 120MB/s. All the cable/connection speed in the world makes little difference if the chokepoint is still the enclosure’s ability to write to a 5400RPM spinning hard drive at 120MB/s. Striping assists here by breaking a large chunk of data into two or more smaller pieces so that each drive is only tasked with reading or writing the smaller amount, essentially doubling the speed and responsiveness of your read/write process. Should you stripe two 2TB hard drives into a RAID 0 Array, you’d create a single, faster 4TB volume. However, because RAID 0 is only striping and doesn’t include any redundancy, or data mirroring, should one of the two or more drives fail, you lose the entire array.

RAID 1, or mirroring, prioritizes the other half the equation, combining two drives into a mirrored array. Every file or block of data is written to both hard drives at the same time. There are no speed or throughput write benefits to mirroring, since every piece of data must be written in its entirety to both drives in this configuration. There is a small speed increase on the read side, as the RAID controller can read segments of the file from multiple drives at once faster than it can read the file from a single drive. The benefit of mirroring, however, is 100% redundancy—all of your data lives in two or more places. When you mirror two 2TB hard drives in to a RAID 1 array, you create a single 2TB mirrored volume. Should one hard drive fail or become corrupted, your data is protected, you continue to work and you replace the failed drive with new hard drive and the RAID Controller rebuilds the “degraded” array automatically, re-creating your mirror.

Finally, RAID 5, using three or more drives, is a combination of both of the schemas above; thus the user benefits from both striped speed performance across multiple drives and data protection of parity information (a smaller block of data that can be used to recreate the loss of a larger block through complex mathematics and error checking) written to a separate drive. While you can build a RAID 5 array with only three drives, due to the complexity of the task at hand for the RAID Controller, juggling striping and the parity information, at least five drives are often recommended to spread out the read/write tasks.

Most RAID 5 arrays are built with multiple drives of the same size, speed and specification. Combine three 2TB hard drives in a RAID 5 array, and you build a single almost 4TB volume with a performance improvement over a single 4TB hard drive and an ability to absorb/rebuild a failure of a single hard drive within the array due to the parity information stored on the third drive. Combine five 2TB drives in to a RAID 5 array, and you build a single almost 8TB volume (remember, one drive is always reserved for parity information) with even better read/write performance and again the ability to absorb/rebuild the failure of a single drive within the array. Lose a single drive in the array, and the volume remains functional in its degraded state until the failed drive is replaced and the RAID array successfully rebuilds the failed drive from the stored parity information.

Advertisement

Beyond the above discussed schemas, there are a variety of additional RAID schemas combining striping and mirroring in various forms at varying levels of complexity; say, mirroring two large sets of stripped drives for faster read/write speeds with redundancy, or ensuring greater redundancy of parity information across larger drive arrays so that a RAID volume could absorb two or more simultaneous drive failures without data loss on the volume. All of these additional schemas require a greater number of drives, yet return a comparatively smaller-sized functional volume.

It’s important to note here that while a number of the RAID schemas listed above do protect against the inevitable dreaded drive failure, RAID, in and of itself, is not backup. I’ll repeat that. Writing your images or other data to a mirrored or other parity-level RAID array is not the same as backing up your data. It can give you a level of protection against hard drive failure, but should you delete a file on your RAID array, the file would be immediately deleted from all drives in the array. Further, an error in the RAID Controller or an operating system glitch could corrupt all the drives simultaneously or damage the directory structure written to the entire volume, all leading to data loss.

For this reason, it’s still important for you to maintain a minimum of two, if not three, copies of your important data at all times, following the mantra of Online, Offline and Offsite, some or all of which incorporate RAID. Your online storage is your attached working volume, your offline remains mostly physically disconnected to protect against accidental deletions, power surges and other corruptions, while your offsite storage is the last line of defense should your home/office suffer a catastrophic blow in the form of fire, flood and beyond.

RAID formerly was a realm of specialized use and sophisticated management, but desktop prosumer storage options continue to trickle down to the masses in the form of easily managed enclosures offering a lot of bang for the buck. With a better understanding of how various RAID schemas manage your data, we hope you can better intelligently incorporate RAID into your primary and backup storage plan.

Advertisement

A Perfect RAID For Every Need

Now that you’ve decided to convert your various primary working and backup volumes to RAID, let’s look at a handful of the more popular consumer- and professional-level solutions on the market. There are a number of popular players in the personal/professional RAID storage. We’ll take a look at options from G-Tech, LaCie and Promise. Each make solutions up and down the scale, but here are highlights from each.

The Consumer. LaCie has a great history of solid-quality, budget-friendly solutions, and the 2big Thunderbolt 2 two-drive RAID enclosure is a great starting point for photographers looking for a simple, speedy RAID0 or protective RAID1 array. The enclosure features both USB3 connectivity and dual Thunderbolt2 ports for daisy chaining, meaning as you fill one RAID, you could potentially purchase additional units as a scalable solution. With a built-in RAID Controller, the enclosure comes with LaCie’s desktop software for easy management and monitoring. The enclosure comes standard in 6TB, 8TB and 12TB capacities, and prices start at just over $500.

The Professional. Notching up from LaCie in both capacity and feature set, enter G-Tech’s GSpeed Studio (4 drive) and Studio XL (8 drive) solutions, both with dual Thunderbolt 2 interfaces. The units feature enterprise-class 7200RPM drives, boasting longer life and lower error rates than standard consumer drives. With additional drives come options for RAID 5 for better balancing of speed and redundancy, as well as additional, more sophisticated RAID schemas referenced previously. Additional drives and better backend architecture boast advertised read/write speeds two to three times faster than consumer-level units, great for today’s larger image sizes and 4K video editing. Onboard RAID Controllers are once again simply managed by G-Tech’s robust management software. The base GSpeed Studio starts at $2,000 for 12TB and scales through the product line all the way up to the largest Studio XL 80TB unit ringing in at $9,000.

The Enterprise. Long a player in the enterprise and larger server farm world, Promise does offer a Pegasus solution for the desktop environment, but a large portion of its business derives from the larger and more scalable network-accessible, rack-based solutions in its VTrak line. For photographers working in group environments or corporate creative teams with an eye toward growth, the enterprise-level RAID solutions from Promise offer compelling feature sets with powerful remote web management, high-speed dual-fiber channel connectivity, redundant cooling units and power supplies for minimal downtime, and expandability up to 7 petabytes (that’s 7,000TB!). There are far too many options to properly discuss in this short format, but the VTrak E-Class 12-bay units begin around $4000, plus the cost of hard drives, scaling up to the fully featured 24 bay A-Class with a price tag north of $50K.