The Cold Cloud: Long-Term Backup Storage in the Public Cloud

Monday Apr 24th 2017 by Christine Taylor

Public cloud vendors offer several low-cost options for storing data that is accessed very infrequently.

Efficient storage management includes migrating aging data through progressively less-expensive storage tiers. When data ends its migration at the cold storage stage, you can keep it for long periods of time at very low cost.

Cloud-based data storage generally falls into these four storage classes or tiers:

Hot storage is primary storage for frequently accessed production data.

Warm storage stores slightly aging but still active data. It costs less because the underlying storage systems don’t have the high performance and availability requirements, but it keeps data quickly accessible.

Cool storage houses nearline data, which is less frequently accessed data that needs to stay accessible without a restore process.

Cold storage is a backup and archival tier that stores data very cheaply for long periods of time. Restore expectations are few and far between. Security, durability and low cost characterize this tier.

Cold Storage Usage Examples

The biggest single reason for using cold storage is saving money by reducing use of hot, warm and cool storage tiers. Cold storage provides efficient and infinitely scalable capacity at a lower cost than any other storage tier.

For example, the healthcare industry produces massive amounts of medical images with retention requirements in the decades. The financial industry also has steep retention requirements, in some cases up to 30 years. Many financial institutions have stored this data in tape vaults for many years, but restoring massive data sets from tape is expensive. Cold storage in the cloud retains data for long periods, and restoring the data does not require original tape drives.

Litigation and regulatory investigations are also cold storage usage cases. For example, a retail chain might store massive amounts of backup on the cloud. One day the company receives a lawsuit from a customer who slipped and fell in a store seven months ago. The business will need to search through their backup for relevant data, collect it, analyze it and provide it to the reviewers within a few weeks. This is far simpler to do on cold storage in the cloud than from massive tape collections.

A third scenario is preserving raw data for analytics and secondary applications. Massive data sets are very expensive to keep on hot or warm storage systems. Cold storage tiers keep the raw data available for occasional access at a very low cost.

Cold Storage and the Public Cloud

For many companies, cold storage in the cloud offers distinct advantages over on-premise nearline storage or tape vaulting. The public clouds are ramping up their cold storage in response. Amazon Glacier and the new Google Cloud Storage Coldline are dedicated to long-term cold storage. Azure uses its Cool Blob Storage to serve both cool and cold tiers.

The three services have a lot in common. Storage pricing is very similar. Amazon and Google both charge .007 cents per monthly stored gigabyte. Azure charges by geographical regions, with price points ranging between $0.01 per gigabyte and $0.024 per gigabyte for cool and hot blobs. (Cool blobs are priced at the lower end of the scale.) Data access and recovery are more expensive than simple storage, which protects the public clouds against customers using cold storage as a cheap active data tier.

Durability is critical for all three services. Both Glacier and Coldline clock their durability in 11 nines (99.999999999 percent). Both services achieve this availability level by redundantly storing data across multiple domains, storage systems, and disks. As for durability, Azure goes beyond 11 nines by guaranteeing 0 percent data loss for both hot and cool storage blobs.

Recovery service levels differ somewhat between the three. For example, Amazon Glacier offers different service levels for restore times that range from minutes to hours while Google Coldline and Azure Cool Blob Storage offer fast recovery in milliseconds. Not everyone needs to recover cold data storage in such a short amount of time, but if you do — such as quickly accessing a backup data set — then the much shorter access time could prove very handy.

Data transfer times are important to uploading data as well as retrieving it. Whether you backup first to the cloud or keep backup copies on-site and then back them up, you need cloud transfers to stay within backup windows. The most efficient way to do this is to choose a backup product that backs up incremental changes and rehydrates them into a full restore. Also, look for backup providers who can accelerate cloud transfers between the on-premise data center and cold storage tiers.

Amazon Glacier is a member of the Amazon Web services (AWS) family and offers unlimited cold data storage. Amazon isn’t kidding about Glacier being a cold storage tier. Retrieval costs are higher, and data access and recovery can take five hours or more. The two-step retrieval process first retrieves data from the staging area, then offers a 24-hour window to either download it or access it via Amazon EC2. Glacier offers a bulk retrieval option that enables businesses to retrieve TB and PB of data, typically within 5 to 12 hours. This option enables customers to cost-effectively use Glacier to store big data for occasional analysis.

Google designed Google Cloud Storage Coldline as a direct competitor to Glacier and charges the same amount for monthly storage. Both Amazon and Google discourage frequent data retrieval from their cold storage services, but when customers need to retrieve data fast then Coldline is there with retrieval periods in the milliseconds. Google markets these fast recovery times for the disaster recovery market, where customers may need to download high volumes of data very quickly.

Microsoft Azure Cool Blob Storage is more like Google Nearline and Amazon S3 Standard I/A than Coldline and Glacier, but it can serve as cold object-based storage. Both hot and cool Azure blobs store unstructured data as objects. As with the other two cold storage services, cool data is much less expensive to store than hot tiers. Data retrieval is in the milliseconds and access costs are higher than storing cold data.

Cold Cloud Backup Vendors

All backup products backup to the cloud as a target, but not all of them optimize backup to cold storage tiers. Typical features for this level of integration include policy-based backup and archiving to the cold storage tier, indexing the tier for faster search and recovery, and offering flexible site choices when recovering data.

Cloud backup vendor CloudBerry Lab supports multiple clouds, including Amazon, Google, and Azure, for cross-platform backup. CloudBerry backs up to cold and cool storage classes as well as hot and warm tiers, and it was one of the earliest backup vendors to support Google Coldline. Image-based backup and strong encryption round out the portfolio.

Cohesity hyperconverges massive secondary storage on-premise, remotely and in the cloud. Cohesity also supports all storage classes and directly backs up to Amazon Glacier, Google Coldline and Azure Cool Blobs.

Commvault backs up to multiple clouds including all three hyperscale public clouds. Its Simpana software can back up directly to Azure Hot and Cool Blobs, and integrates with Amazon S3 and Glacier. Users can adjust some cloud settings using the CommCell Console.

The Benefits of Cold Cloud Storage

Stored data is growing at a terrific pace, and businesses need to retain much of it for compliance, analytics, and research purposes. Keeping all this data on costly storage tiers is extremely expensive, both in capital and operating costs.

Up until now tape has been the solution to cold storage requirements. But massive data volumes and the need to quickly access data for recovery or analytics have outstripped tape’s effectiveness.

This is why the cloud is deservedly popular for storing cold data — and why public cloud vendors have stepped up with cold storage services. IT must still perform due diligence to investigate which cold storage tiers are optimal for their needs and which backup vendors optimize cloud-based cold storage tiers. Although this research will take some time and energy, the cost and durability benefits of cloud-based cold storage are more than worth it.