Storage Tiering with External Infrastructure

Posted on July 13, 2016 By Christine Taylor

In a recent article, I discussed internal array tiering and server-side tiering. In this article I’ll concentrate on an external tiered infrastructure. Although some vendors push a two-tier architecture, they are only referring to a two-tier production storage system, not tiering across data lifecycles. That looks more like this:

Storage Tiering Automation Solutions

Moving the data to less expensive systems is the obvious solution—and a good one. What’s not so obvious is how IT can protect its admins’ time given never-ending data creation and fast-growing storage requirements. Automation is helpful when you can get it but there is no single solution that automatically tiers aging data from creation to cold storage.

The lesson is to automate as much as possible. And some storage tiering products can tier between homogenous arrays, external storage systems and native cloud tiering:

Note that automated storage tiering is not the same thing as policy-driven replication or automated backup. Both of these processes copy data; automated storage tiering moves qualifying data to less expensive storage tiers as opposed to copying it.

Automated storage tiering takes a financial investment because you usually must buy a new storage system to get it. However, if you need a storage refresh anyway, it can be worth the price. Performance on production systems may improve as your system consistently moves data off the expensive storage. And you save money on OPEX by storing data on lower-cost media and extending the life of the expensive production storage.

Another benefit is test/dev. When you have at least a two-tier architecture on-premise, you can use Tier 2 HDD systems for test/dev without impacting production performance. The same principle applies to big data analysis. A clearly constructed tiered infrastructure will also be helpful to data governance and security.

Defining Data for Storage Tiering

Let’s look at the data characteristics that drive tiering decisions:

Access activity. Array-based dynamic storage tiering functions read metadata and access patterns, then move data accordingly between Tier 0 and Tier 1 and possibly Tier 2 as well, depending on the array’s architecture. External tiering works the same way, and tiers may be physical or virtual depending on the specific automated tiering product.

Workload size. The size of the tiered data will affect storage tiering decisions. Smaller data movement allows more sensitive data placement and has less impact on storage media, which must have the capacity to store a large single dataset. So while a NAS tiering operation can tier once a day or more on a file basis, an OLTP database may tier 4KB blocks several times a day.

Data priority. Prioritizing data is fundamental to intelligent storage tiering. Tiering solutions’ most basic function is to identify data based on age, but this is certainly not the only characteristic to use. Business priority and governance also impact data movement policies. For example, seconds-old data in a customer transaction system clearly has high business value. Once the order is fulfilled, that immediate value diminishes, and within a few days or week, the data can move to a nearline HDD array and from there to tape or the cloud. In contrast, big data for business analysis will not move to tape or cold storage for a long time. Automatic tiering will move it from production storage to an array that provides sufficient capacity and performance for ongoing analytics. A third major determining factor is risk: how fast can IT locate and recover relevant data during a litigation or audit, and is that same data subject to multiple matters? In this case, IT may choose to keep potentially relevant data on-premise, even though the data may be months or years old.

In the face of complexity and added expense, why would IT bother with storage tiering?

They bother because stored data follows the old 80/20 rule: give or take a few percentage points, 20 percent of data is accessed 80 percent of the time. The other 80 percent ages out within a few weeks. IT can’t simply get rid of that data, not for a long time (if ever). But this data eats up capacity and energy on those expensive production systems. Ultimately storage tiering frees up resources, avoiding extra CAPEX and lowering OPEX over time.