How To Choose A Data Archiving Platform

No matter what your motivation for archiving data, the storage system needs to provide data integrity, scalability and power management.

7 Cheap Cloud Storage Options

(click image for larger view and for slideshow)

At the end of the year, the topic of data archiving heats up. My last column covered different methods for moving data to that archive. This column we will take a look at storage systems that want to be your repository for storing this information. In general, no matter what your motivation for archiving data, the archive storage system needs to provide data integrity, scalability and power management -- and, of course, do so at competitive pricing.

There are several types of devices that you can archive to. The first and one that might be overlooked is a big disk array. Although these often don't have the capabilities to do continuous data verification and might not have the large scaling capabilities that other, more archive-specific systems do, they do have one big advantage: Price. These systems tend to be very cost effective if your archive requirements won't reach the limits of a single array. A few of these systems also have very mature power-saving capabilities such as spin-down drives.

Another option outside of traditional archive storage systems is cloud storage services. Cloud has the advantage of not taking up any of your data center footprint and never running out of capacity. Some cloud providers via third-party archive solutions also can provide complete data integrity checking. They also, of course, have the advantage of a pay-as-you-go license, so the upfront investment is minimal. The downside to these systems is that they are pay-as-you-grow as well. You keep paying and paying. Storing terabytes and terabytes of information in the cloud for decades could be very expensive over time.

There is the option to build your own cloud storage system in house; in other words, a private cloud. As I recently described in my article "What is Object Storage," most of these systems tend to use an object file layout. This gives them tremendous scalability and consistent performance even as the amount of archive data increases. Leveraging an object layout also provides the foundation for doing continuous data verification.

These systems also tend to scale one node at a time, providing a similar pay-as-you-grow capability. Unlike the cloud, though, you own it. This has its pros and cons. There is also the challenge that you have to store all your data on disk. That means these systems need to be powered and running in order to operate. Few scale-out object storage systems have developed the capability to "spin-down" nodes.

Finally, there is tape. Tape wins hands down for price competitiveness and for power efficiency. The above technologies all provide near-instant retrieval. Tape does not. But you have to ask yourself, if a request comes in for data that is 10 years old do you really need to recover it in seconds? Or can it wait a few minutes for the tape to be loaded into a tape drive, found and then recovered? If that is the case then tape might be for you.

Another concern about tape is data integrity. As we discussed in our webinar The Four Reasons The Data Center is Returning To Tape, tape cartridges have actually been proven to be more reliable than a disk drive but they don't have the built-in data integrity checks that some of the above methods do. However, some archiving solutions that support tape provide the ability to perform scheduled scans of tape drives so that integrity can be assured.

So, which one to pick? Most vendors mistakenly look at the archive target as a zero sum game. It all must be on their hardware. We find that most data centers are better served by a mixed approach that leverages two or more of the above solutions: Use disk for the medium-term archive of data, and tape for the long-term deep archive. In fact, in an upcoming column I'll discuss how to leverage tape with either a private or public cloud.

I would definitely agree that tape is not dead for all the reasons mentioned above. In addition, having a PBBA with a tape format interface (AKA virtual tape library) that can utilize physical tape on the backend means that data centers don't require a massive re-work of their existing backup software policies and procedures.

An organization that I'm involved with is looking at a four tier storage hierarchy - Primary storage on disk, first archive level on tape, second archive level on cloud, third archive level on long term cloud storage with a different provider. Moving data from one level to another requires analyzing what the data is, what it's being used for, and how long it's been dormant. It also gives the organization the capability to restore from different points in time. The idea is to balance recovery objectives, storage requirements and cost-effectiveness.