During the past three to four months the storage industry has seen a spike in the number of reports, white papers and news articles surrounding the evolution of primary storage technology, capacity optimization (it is 2010’s Hottest Storage Technology).

The reason this technology is getting a lot of ‘air play’ these days is due to the fact that this technology is so critical to help control the growth and costs of storage. In 2010 the EMC sponsored IDC Report “The Digital Universe Decade - Are You Ready?” was release and stated that:

When you combine storage capacity (and the foot print it takes up) along with the power it takes to run it and cool it as well as the human resource it takes to manage it, you soon realize we cannot keep ‘just adding more cheap disk’ in an effort to manage the storage demands. High Tech companies with high tech labs are also telling IT that ‘they are out of tricks’ when it comes to the ability to continue deliver disk drive that double capacity every 18 months. It is for these reasons that primary storage optimization technologies have stepped into the ‘lime light’ as it serves as a means to help control the growth of primary storage including the foot print, power, cooling and man power required to manage it.

However, as we all know in IT, no two environments are the same and what may be good for one may not be good for another. When looking at primary storage optimization there seem to be a number of available technologies and ways to deploy these technologies and the key question is what is right for ‘my’ environment.

The first things to consider are:

1) What is the primary objective of the storage system(s) in my environment (it may be different for different systems)?

2) What are the primary characteristics I look at when I purchase a storage system?

3) What are my current business objectives surrounding my storage?

It is important to remember why you acquired your storage in the first place, and leverage all of the decision making processes that surrounded the acquisition of that storage. For example, if performance was a key characteristic for acquiring a particular storage system, then it should stand to reason this is something that can’t be sacrificed when looking to add capacity optimization into the mix.

There are also a number of ways to optimize storage capacity. The two technologies are compression and data deduplication. The methodologies in which they can be deployed are inline or post-process.

Compression technologies can reduce storage footprint anywhere from 50% to 90% depending upon the data type. Compression technologies have been around for decades and are trusted technologies. Compression can be deployed as a post-process (think of WinZip – zipping a file once it has been stored), or it can be deployed as a real-time application that does compression on the fly – this is the Storwize model.

Data deduplication technologies can reduce storage footprint anywhere from 10% to 50% on primary storage except its dependency is really on data usage not type. In environments where there isn't a lot of repetitive data the deduplication ratio will be low. In environments where there is a lot of repetitive data, the optimization ratio will be high. Today’s data deduplication solutions for primary storage all happen post process. (This is primarily due to the performance limitations when trying to deduplicate data in real-time on primary storage.) There have been announcements in the past few weeks that have mentioned data deduplication technologies becoming embedded into storage systems (which is where this technology should be) and this will significantly help with performance.

Now that there is a basic foundation for what these technologies can do, the real question is how do these fit into the overall requirements for specific storage needs. It is important to take a look at the storage within your environment and what the impact of each of these technologies and their deployment has on that environment. For example, if you don’t have any system resources left over in a day to perform a post process operation, then a real-time deployment (as long as it does not degrade performance) is the logical solution. If you have a great deal of repetitive data (VMware .vmdk files – without the data stored in the file) then a deduplication solution is the best fit. If transparency within your environment is important (not having to rearchitect applications, networks or storage) then a solution that allows you to optimize your capacity without having to change anything of these the right solution. Conversely if you have plenty of time to compress data once it is written, and no need to worry about human resources to compress or decompress the data (like WinZip) then this is a perfectly viable solution as well. I could go on, but I think you get the picture.

The other key variable is cost. As my grandfather once told me “You get what you pay for in life” and “Nothing is free”. Each of the different technologies outlined above come at different prices. One thing to keep in mind is that the value of the solution is directly proportional to the cost. Some solutions may say they are ‘free’ however, when you consider it takes horse power to run these solutions, they aren’t free and if it takes developing a bigger system to handle the optimization work load or reconfiguring your system to enable optimization to work properly or changing a recently acquired backup technology in order to make primary storage optimization be effective throughout the entire process, then the solution really isn’t free. These are all things to consider when looking at an optimization technology.

Remember, just like data deduplication for backup 5 years ago, it sounded too good to be true and now if you don’t use it for backups your missing out. Don’t let the same thing happen to you in your primary storage solution. Get ahead of the curve and if you have any questions – please ask away.

Deduplication is great, but it introduces a data mangement challenge when it comes to reducing storage utilization. Suppose I start running out of space on a volume and have to delete or migrate some data. Without deduplication, I know that if I delete 200 GB of files, that is the amount of space that will be freed up.

With deduplication, there is no way to tell how much actual space those 200 GB actually take up, and what other portions of data are dependent on them.

First, thanks for the comment and pointing out some challenges that come with any new technology. We have heard the same question from our customers. This is where the thing provisioning tools in storage come in handy. As long as the space is available – you can alway give the application more capacity.

A further technology is Native Format Optimization (NFO) which, in contrast to deduplication doesn’t optimize information across files or blocks and in contrast to compression doesn’t change the output file format (which requires the performance-intense rehydration), optimizes unstructured file content within the single file. It can reduce the size of unstructured files by 50-90% without impacting e.g. dedupe on the backup.

Storage Alchemist Youtube Channel

Disclaimer

The opinions expressed here are the personal opinions of the author. I am a technologist with opinions who's opinions are mine. This is an independent blog. Content published here is not read or approved in advance by anyone other than myself.
Copyright 2014