The upside of an economic downturn: Storage efficiency

Posted on February 12, 2009

-- Warren Buffett's remark that, "only when the tide goes out do you discover who's been swimming naked" is apropos for the current economic downturn. Today, more than ever, many organizations are feeling exposed and anxious.

An economic downturn, while difficult for many, is a prime opportunity to re-examine your IT environment and find ways to economize. Companies are trimming staff, cutting budgets, and postponing projects. However, this doesn't mean organizations should only pursue cost-cutting measures. Smart investments today—focused on efficiency and cost savings—could also help organizations emerge from an economic downturn better positioned than before.

Narrowing the scope of this concept to IT, data center environments are likely to have many areas where waste and inefficiency can be found. One such area to examine is storage capacity—for both primary and secondary storage systems. Data growth is in the double digits for most companies, but do repeated purchases of additional storage capacity for stale data make sense? Taking the opposite tack, what technology investments can be made today that will provide short-term ROI and, perhaps more importantly, long-term benefits?

The Enterprise Strategy Group's four data lifecycle stages frame the discussion of a few key areas worthy of optimization. In the lifecycle model, data at each stage can be characterized as dynamic or persistent, active or inactive, and online or offline.

Stage 1. Dynamic/active/online data is still in flux and is actively being referenced. Stage 2. Persistent/active/online data is unchanging, but still maintains a high rate of access. Stage 3. Persistent/inactive/online data is not changing and is infrequently accessed. Stage 4. Persistent/inactive/offline data is unchanging and rarely accessed, qualifying for long-term preservation in an offline archive.

The infrastructure, technologies, practices, and policies applied to data should be aligned with each stage in the lifecycle. Doing so could create significant efficiencies that could in turn lead to cost savings, and more importantly, to a more optimized storage environment.

Stage 1 data should be on the fastest, most fault-tolerant infrastructure with the most stringent data protection policies. Since data is continually changing, incremental snapshots performed between daily file-based backup to disk will aid in delivering better recovery points. Mirroring data locally for operational recovery and replicating it offsite for disaster recovery will improve recovery time objectives (RTO) and minimize downtime. Data de-duplication on secondary storage can extend the life of existing storage capacity and reduce additional capacity purchases over time.

Stage 2 data has much of the same value as Stage 1 data—it's still frequently accessed and therefore requires an infrastructure that ensure resilience and users' continued access. Data protection policies can be somewhat relaxed, however. Since data is not changing, less frequent copies and offsite replicas are needed. It is still prudent to leverage disk-based backup to maintain RTO; as such, data de-duplication can afford significant savings at this tier—for on-premise and offsite stores and the network bandwidth in between. Capacity optimization (data de-duplication and compression) technologies for primary data may also be prudent, especially since there are inherent downstream efficiencies gained with secondary storage optimization.

Stage 3 of the data lifecycle is where there are significant opportunities for improvements and economic and operational gain, simply because the overwhelming majority of corporate data is at this stage. Since its usage pattern has radically changed, the entire infrastructure and processes needed to meet SLAs for this data should also change via migration of the data to lower cost, lower performance storage platforms (i.e., moving persistent and inactive data to an active archive tier). Unchanging data not only creates capacity glut and performance drag on primary systems, it also chokes the backup processes. By creating an active archive, primary systems' performance can be improved and the capacity of data transferred across the LAN, WAN, SAN and through data protection systems will be significantly reduced. Single-instance storage capabilities in this tier can reduce storage capacity 3x to 5x, and sub-file-level de-duplication can optimize capacity even further (for the active archive tier as well as in secondary storage).

Stage 4 data doesn't merit the same level of infrastructure or accessibility as the previous stages. Based on some age or access metadata, this data should be moved out of an active archive to tape, optical, or removable hard disk media. The unchanging, inactive nature of data in this stage doesn't require it to remain in the daily/weekly/monthly backup policy cycle. Setting this new limitation will score gains in tape handling overhead, as well as the costs of physical media and offsite storage.

These concepts are merely a high-level review of the ways in which an organization can adapt storage policies and infrastructure to secure economic gains. Adding secondary disk storage (in place of tape), archive platforms and/or software, data de-duplication, or compression qualify as incremental investments geared toward optimizing the environment for long-term business, financial, and operational gains.