Bye-bye, tapes: How to get the right disk backup appliance

FREE

Become An Insider

Sign up now and get free access to
hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content
from the best tech brands on the Internet: CIO, CSO, Computerworld, InfoWorld,
IT World and Network World Learn more.

Deduplicating backup appliances have become very popular, but choosing the right one means looking beyond deduplication

Email a friend

To

From

Thank you

Your message has been sent.

Sorry

There was an error emailing this page.

InfoWorld|Jun 4, 2012

Although disk-to-disk backup methodologies have become incredibly popular over the past few years, the vast majority of enterprises -- large and small -- still use the same tape backups they implemented years ago. As time goes on, however, more and more old-school backup implementations will reach a breaking point where either capacity or performance can't get the job done.

When you realize that tape can't cut it any longer, you'll likely consider using a disk-based backup appliance, which you can get from many vendors, such as EMC Data Domain, Exagrid, and Quantum. But when choosing the right appliance, be careful: Most buyers focus on the most efficient deduplication engine, but that's only one difference to explore.

The deduplication engine gets IT's attention because the whole point of implementing dedupe is to shrink the amount of storage you need to hold your backups -- to save both on physical storage costs and to gain longer on-disk retention times. But capacity efficiency is a relatively small issue in practice. Most of the significant operational differences are based on when in the backup cycle that deduplication takes place and how crucially important scalability is achieved.

The scale of the data being backed up is a key factor as well. And for disk-to-disk backup appliances, there are two primary approaches to the scalability issue: scale-up and scale-out.

Users of traditional SAN implementations will be familiar with the traditional scale-up approach, which typically involves the use of static controller/compute resources attached to a variable amount of storage. In these deployments, you can introduce additional capacity relatively cheaply and easily, both to lengthen retention times and to store your growing data pools.

However, you have to carefully consider at the outset the sizing of your controller resources. As with scale-up SAN implementations, you have to estimate up front both the overall intended capacity and performance requirements for the end of the device's expected lifetime -- which is often difficult to do accurately in today's quickly changing IT landscape. A failure to estimate properly might result in large, unexpected capital investments to upgrade the controllers before you had planned or, arguably worse, overbuying at the outset and retiring the device before its full performance potential was ever exercised.

The scale-out approach avoids some of these pitfalls, but isn't without its own problems. In scale-out implementations, controller resources are generally paired with fixed storage resources, and scalability is achieved by scaling the number of devices in a group as performance and storage requirements change. This handily avoids the need to perform accurate long-term planning, since each year's backup storage investments can instead be guided by short-term requirements. It also largely avoids the risk of substantial overbuying or underbuying.

However, the fixed relationship between controller and storage resources can present a problem when you require more of one than the other. For example, you may want to provide extremely long retention for a relatively small amount of quickly changing data. Doing that with a scale-out platform might require purchasing a large amount of controller resources just to get the required storage density -- which would be much easier and cheaper to accomplish with a scale-up system. Some scale-out platforms also have scalability and management limitations that might make them inappropriate for very large enterprises dealing with truly enormous backup datasets.