Network-attached storage

As many of you may know, I work for EMC‘s Cloud Infrastructure Group as part of the Atmos solution team. In this role, I’ve been blessed with getting a closer look at where the future of cloud storage is going as well as some of the drivers that will get it there. In this post, I’d like to talk a bit about policy and how this will shape the future of storage. I’m going to keep this as abstracted from product as possible, but where appropriate, I’ll try to show you how products are implementing this technology TODAY.

What is Policy?

By definition, policy is “[an] action or procedure conforming to or considered with reference to prudence or expediency” (dictionary.com for that definition). When viewed in the context of storage systems and management, policy, then, is the actions (scripted or otherwise) that influence data to provide for retrieval, performance, or manipulation by systems. In other words, policy is an engine that manages data from start to finish. Why this is important requires us to look at what the typical management stack looks like today.

Data is created by users accessing programs that are tied to physical and virtual resources. This generated data is then processed and stored by the programs and their underlying storage I/O layers (LVMs, hypervisor I/O stacks, etc.) onto some sort of storage device (SAN, NAS, DAS, etc.) where it sits until next access. In essence, once data is created it is considered to be “at rest” until it is next accessed (if ever). Within this data generation and storage continuum, the process is fundementally simple as generated data is put directly to storage. However, if the data continues to sit in the same place endlessly, it’s typically inefficient to retrieve and access. Managing this data was typically a manual process where data, LUNs, and their topologies had to be moved around using array or host-based tools to provide better “fit” for data at rest or data accesses for performance. This is where policy steps in.

Policy uses hooks into data (also known as metadata) in order to enact controls. Please see this post for more detailed explanation of metadata.

Why use Policies?

If the previous example shows anything, it’s that the management of data is fundementally…well, boring and manual. Policy provides a method of controlling the stack of data ingest AND data management while allowing business to continue to generate, retrieve, and manipulate data. For example, a simple policy that could be enacted against data could be as follows:

Obviously, that’s a high-level abstraction of what the actual process for data control would look like but drives the point home. What used to be a manual LUN migration policy to “performance” or “store” data now is set based on a logical control structure that can be automagically enacted on the storage system itself. A working example of this type of policy can be seen in the tiering provided by Compellent and EMC’s FAST systems for storage management. Pretty cool, huh?

An alternative method of control that isn’t necessary tied to the storage array is the recent introduction of VMware‘s Storage DRS (Dynamic Resource Scheduling) which is enacted against the storage I/O stack of VMware’s vSphere hypervisor.

The Future of Policy

Obviously, my examples are very simplistic in nature but hopefully, they make the policy technology somewhat more accessible. As far as policy futures are concerned, this is where storage technologies (and even host process management) will be going. In the future, simple policy creation and enforcement will be a necessary part of storage pool creation and integration as well as the ongoing maintenance and support of storage arrays.

I’ve been ruminating on a conversation that I was part of at the recent Cloud Camp – Boston “un-conference.” In this particular case, a customer (a VAR; NOT a manufacturer) was talking about leveraging cloud storage for a particular customer of theirs who had the following “essential criteria” that needed design help: multiple petabytes of storage, significant unstructured data, low cost of entry, data primacy/ownership (e.g. privately controlled assets/data), and very little need for typical NAS/SAN implementations. The questions that this VAR brought up were related to designing for this type of storage. Let’s explore this a little more (remember, just thinking out loud here) by looking at retrofitting cloud-type storage (a la Atmos) versus looking at a “net new” installation of a completely cloud storage based infrastructure.

Retrofitment

The concept of retrofitting is to shoehorn a “new” product into a space where “old” product was either unsatisfactory or incapable of servicing the ongoing data needs of a company’s infrastructure. In this case, the goal is to use as much of the existing infrastructure as possible to minimize cost while at the same time providing the much-needed boost in management and capability brought to the table by the new technology. In these type of cases, the ability of the storage product (in my case, Atmos) to integrate seemlessly is vital to bringing the “cloud” to the table. Atmos, for what it’s worth, offers the ability to integrate into traditional NAS/SAN environments through CIFS, NFS, and IFS connectivity options (IFS is through a RHEL 5.x client) while also allowing the customer to develop connectivity and SOA options through REST/SOAP API interfaces. This way, Atmos allows you to granularly “grow” into a API-based storage model without completely getting rid of (dare I say it? 😉 ) legacy NAS/SAN environments.

Net-New

The Net-New concept really thrives when the customer is at a cross-roads; the need for new technology and infra outstrips the need to preserve the current infrastructure (obviously not limited to just the infrastructure discussion ). The idea here is that by adding a “cloud capable” infrastructure the company can look to potentially minimize the overall OpEx recidivism that they experience as part of their normal buy cycles. (that was a painful sentence to write.) Objectively, a net-new architecture allows a clean-slate “ground-up” approach to storage architecture where careful design and planning can be based around hybrid cloud capabilities (e.g. federation between Atmos and Atmos Online) as well as the scalable growth that is offered by those platforms. Again, provision is made for integrating into the infrastructure where needed via the aforementioned NAS capabilities (CIFS/NFS/IFS) but the emphasis is placed on self-service through the API interface.

Your Choice

The cool part about this evolution is that the choice is ultimately up to you as to how and when you implement. Having the capabilities of integrating and growing now cannot be overlooked but, obviously, there are challenges with any type of new integration. Similarly, tossing out the old and bringing in the new has its own sets of challenges such as internal SLAs that IT has with it’s “internal customers” etc.

Today is the official GA of EMC‘s latest-n-greatest product, Atmos. For all intents and purposes, you probably have heard of Atmos under a different guise: Maui. In any case, I’ll be taking a look at some of the features and functionalities of Atmos as well as potential integration points for your business. Finally, if you’re really interested, I’ll discuss the hardware underpinnings of Atmos which point towards its ability to scale to multiple-petabytes of information storage. It will probably be easiest to “unpack” the press release, but lets get some of the foundational information out of the way.

It’s been awhile since I last did a review of what people are searching for (July 30th was the last time…wow) so, let’s see what’s new.
Search Term #1: EMC NX4
Not really suprised here. Honestly, take the Celerra NS-20, cut the price signficantly, allow blended SAS/SATA drive trays,