There are two reasons why Solid State Disk (SSD) needs to be used as efficiently as possible. First, while it’s far and away a higher performing storage medium compared with mechanical hard disk drives (HDD) it still must be purchased at a significant price premium. Because of the higher cost per gigabyte, better capacity utilization has a higher return on investment.

The second reason is one of reliability. Flash based SSDs have a fixed number of writes that they can handle prior to a point where error rates outpace the flash controller’s error correction (ECC) ability. This can lead eventually to total failure of the device. The more efficient you can make the device the longer its useful life.

SSD Capacity Optimization

Due to the premium price of solid state storage, alternate approaches that can improve its cost effectiveness have flooded the market. Unlike typical mechanical storage, many vendors don't want to use solid state as a tier but would prefer to use it as a staging area or cache. Optimization for them means using as little of the expensive tier of storage as possible. In these use cases, data is moved into and out of flash storage as it becomes more frequently accessed. As the movement between storage media occurs, a problem develops as a number of "misses" of the data being requested occurs before that data is actually moved to the SSD tier. Those misses refer to the process whereby read requests fail because data’s not present on the SSD at that time. This means that a certain number of users or customers are going to see degraded performance until the promotion of the data happens.

Caching or tiering also assumes that only a small percentage of the data center’s workload is active at any given point in time. If the amount of active data exceeds the capacity of the cache or SSD tier available, then some users are going to be adversely impacted by the lack of available solid state storage - users that should be on the faster storage. The storage manager will then be forced to buy either additional SSD capacity or potentially a whole new system.

Finally, caching and tiering can be harmful to the life expectancy of the flash SSD. As stated earlier, flash based solid state storage has a finite number of times it can be written to, each write bringing the device closer to that maximum number. As data in the cache or SSD tier ages or becomes less popular than other data on HDD, it has to be moved out of the cached area. This process is called “data turn over.” In a cache or tiering operational mode, the turnover rate of data can be quite high. The result is premature failure of the SSD, at least in terms of years of service.

Cost is SSD's greatest weakness so the best way to improve cost effectiveness of these complex data movement strategies is to reduce the amount of expensive SSD that has to be purchased by optimizing the efficiency. While most data centers want the performance that SSD offers, it’s still too expensive for broad deployment. But, SSD optimization techniques that attempt to overcome this price hurdle, like caching and automated tiering, don't optimize the technology by getting more use out of it. They do so by moving more data through it and potentially wear it out sooner.

The Deduplication Alternative

Deduplication of solid state storage allows solid state to be used in its more natural form; as a real storage area that’s cost effective enough so that data doesn’t need to be copied on and off of it rapidly. Deduplication also allows more data to be stored on the same physical space by eliminating redundancy. As a result of deduplication use, the entire primary storage data set, or a large portion of it, can be stored on SSD. Instead of the high turnover rate of caching or automated tiering, deduplication enables primary data to be affordably at rest on the SSD storage area.

Having all of the data on solid state storage solves two problems that caching and automated tiering solutions create. First, the chances of a "miss" is either eliminated or substantially reduced because all or most of the data is already on the solid state tier. This means that all users benefit from solid state performance, not just the last user to access it.

Second, the longevity of the solid state tier is improved. Since most of the data is now on solid state there isn’t the turnover rate of data as described above. That means less erasures and writes which prolongs the life of the SSD. Again, reads don’t impact flash life, only writes do. Deduplication, especially if it’s implemented in-line, can help even further. Inline deduplication means that data is checked for redundancy before it is written to storage. If the data already exists on the storage area, then there is no need to write it again and so the write never occurs. In solid state storage where write counts are particularly important, inline deduplication has a substantial impact on longevity.

Once Again Better Together

For many data centers a solid state only storage system is out of the realm of possibility. There is simply too much data to store on even a deduplicated pure solid state storage system, so caching or automated tiering needs to be considered. Deduplication has a role to play in these environments as well.

The high turnover rate of a cache or automated tiering system can be greatly reduced by increasing the size of the solid state storage capacity that it resides on. The problem again is cost. Deduplication can make the cache or automated tiering more efficient by eliminating the redundancy on the solid state tier, the same as it would on standard storage. While the effective reduction rate may not be as high as on an entire storage system the size of today's cache tiers are large enough that deduplication can still optimize the capacity of that tier. Actual results will vary, but several vendors are claiming ~3X efficiency with deduplication on a cached or automated tier.

Summary

Solid state is the path to the next level of storage performance. Unfortunately, that path has an expensive toll inhibiting entry for many. Solutions like caching and automated tiering attempt to circumvent this barrier by making sure only a small population of the data set can use the tier.

Deduplication opens SSD up to a larger overall set of data. It can either allow SSD to be the primary tier of storage or it can improve the ability of caching and tiering to support a larger data population. In any use case, deduplication is a winner and a step forward for SSD implementation strategies.