Primary Storage De-duplication: Only for SSD Arrays?

I recently wrote an article for TechTarget that looked at the implementations of data de-duplication in primary storage arrays. One of the things that stood out for me was the lack of de-duplication support in traditional (and some might say legacy) storage arrays.

The cynical amongst us would say that the big 5 storage vendors have no vested interest in introducing a technology such as de-duplication. If dedupe rates can reach those of secondary storage products (i.e. 90% or more) then storage vendors are going to be selling way less storage than they do today – not the most desirable scenario. However I wonder if technology plays more of a role here than pure financial considerations.

The Netapp Effect

Netapp introduced A-SIS (advanced single instance storage) into their filer product range in 2007. Although continually berated by other vendors as performance afflicted, A-SIS does work and does produce savings. Again, the cynical may say that Netapp needs to have some space saving technology in place, bearing in mind the inefficiency of WAFL, however it was still a bold move by the company and five years on, none of the other top 5 have followed their lead (EMC have talked the talk but failed to deliver as yet).

Perhaps through serendipity, Netapp have implemented an architecture that works well with de-duplication. The 4KB block structure, write-new style of WAFL makes technologies such as thin provisioning and de-duplication relatively easy to implement (although it also causes headaches in delivering other functionality such as decent tiering).

On the other hand, other array architectures would find it very difficult to implement de-duplication. EMC VMAX, and Hitachi VSP (aka HP XP24000) still retain their legacy LUN structure, onto which they layer wide striping pools and thin provisioning. The block size in these architectures will be a limiting factor.

Design in Mind

That brings us to the SSD-based array vendors. These companies have a vested interest in implementing de-duplication as it is one of the features they need to help make the TCO for all SSD arrays to work. Out of necessity dedupe is a required feature, forcing it to be part of the array design.

Solid state is also a perfect technology for deduplicated storage. Whether using inline or post-processing, de-duplication causes subsequent read requests to be more random in nature as the pattern of deduplicated data is unpredicable. With fixed latency, SSDs are great at delivering this type of read request that may be tricker for other array types.

Mainstream or Not?

Will de-duplication become a standard mainstream feature? Probably not in current array platforms but definitely for the new ones where legacy history isn’t an issue. There will come a time when those legacy platforms should be put out to pasture and by then de-duplication will be a standard feature. When that will happen will have to be the subject of another post.