Thursday, 10 December 2009

Exchange 2010 - Infinite Instance Storage

It has been a while since my last post, i've been busy on a lot of fronts, i've been revising for my VCP4 Exam (which i passed :) ), working heavily on projects at work and also had a holiday.After my hectic month i've now had the chance to catch up with the latest Exchange 2010 product changes and felt compelled to post on what I discovered within this. It appears that Microsoft has removed from 2010 Single Instance Storage functionality/Single Instance Storage

Introduced in Exchange 4.0, SIS (or Deduplication) ensures that attachment files that get emailed to multiple people are basically only stored as a single master file to avoid storing multiple copies of that file within every user mailbox, so from an overall Exchange Database perspective multiple sent files appears on used storage as just the size of a single file.You were probably as dumbstruck as I was when I read about EOL of SIS on the lastest Microsoft blurb, and your probably thinking to yourself exactly what I did which was there must be a new type of SIS or a new fandangled name for SIS that either improves upon SIS or even completely new architecture to save on storage consumption all together. Well it appears it was neither of those...I digged deeper and came up with the following blog post to that it is now completely EOL.It does seem that like me readers of the official Microsoft blog have great concerns on the architectural changes and the side effects within a typical large scale environment of implementing 2007 compared to 2010. Additionally echoed are concerns on what types of problems it will lead to in future within an operational environment day to day. To be frank Microsoft sound a bit blase when providing justification on why they have removed SIS, they seem to infer that technology like SIS is legacy and customers do not actually benefit from storage reductions, and in fact SIS is being removed to provide performance benefit.How Microsoft can measure that SIS is not usual anymore is beyond me, Exchange customer use cases are all different, but in reality in the field the actual fact regardless of what Microsoft think SIS however small has benefits to reduce storage costs for most organisations, additionally it has made things in Exchange more efficient in other areas such as reducing backup windows and the associated restore times for Exchange databases.The justification from MS seems to be that today compared to 4-5 years ago Disk is cheaper and bigger and yes they are right, it maybe cheaper when they go and compare this to DAS connected environments I have no issues with this. However my issue is that most large organisations like mine do not use DAS for large scale Exchange and large enterprises don't do this with Exchange due to some of the following reasons;

DAS does not provide Volume snapshot capability for backup and restoration activity

DAS Storage volumes cannot be replicatedfor any purpose to a secondary offsite or local array

Backup windows with DAS compared to using a SAN are not even worth providing examples of the difference, backup across the wire with DAS is unquestionably for large volumes of Exchange data going to be slower

You cannot clone a DAS storage volume nondisruptively in the background and quickly like you can on SAN, this is usefull for things that you should regulary perform such as Production backup integrity test.

You have dependancy with DAS between host and storage, you can move/change a Fibre connected server much easier than DAS.

Try providing cache priority or QoS to a DAS volume!

Try managing DAS remotely and from central consoles!

On a TCO front a SAN most probably provides you with much better cost savings and operational savings compared to having pockets of large storage pool with DAS

I'm not a SAN Bigot (maybe just a bit) but I'm sure some of the above reasons orgs use SAN shows what limitations arise by using DAS in the enterprise and why for applications like Exchange you need to implement such infrastructure.

The example cost hike

So to see what type of cost increase I may experience with no SIS by Upgrading to Exchange 2010 take the example calculation based on 1000 Users being sent a Mail with the Christmas message from the CIO which happens to be a 5MB attachment, this attachment being sent to 1000 people would calculate to theoretically 5GB of storage consumption on the Exchange DB which would be avoided with SIS in Exchange 2007, multiply that example in a typical messaging environment with Carbon Copy of example large presentations, more company announcements with attachments (maybe a Lotus Notes Quotation from Procurement?) etc and it will certainly start to become a very expensive option to use Fibre Channel with something like SIS.

Additionally lets not forget here that most organisations who have deployed 2007 have most likely implemented this on new SAN Arrays which are not likely to be renewed and have the capability to host 2010, a SAN is not unfortunately something you can throw away and replace with DAS, Additionally remember DAS has the hidden costs associated with operational management.

So to summarise on the negative side of this post I am not happy with such functionality being removed, by removing SIS from technology I will have no choice to upgrade to in future i've just increased my storage costs by 10% and also increased the volume of disk that I now require in my array moving forward (tough using proprietory tech hey). Lastly to put this into perspective I can't be bothered to find pricing from Microsoft on Exchange but I am more than sure the price of the software is now not 10% less with 2010 :)

The positive comments for Microsoft from this post

I'm not that hard on Vendors all the time, i've got some positive comments here. My positive side of this post mainly focuses generally with the fact features such as SIS moving forward means you should be focusing more on treating your storage strategy and all round planning more seriously with complete Archive methodology.

With a Commercially available Archive solution such as Symantec E-Vault or Quest Archive manager means you can host mail items on lower tier SATA or archive disk storage media, which in turn means you reduce the size of primary Exchange storage and the associated storage requirements of the higher tier level of storage. Importantly however by archiving shouldnt mean you cut your own nose off despite your face and replace SAN with DAS, it still has the tangible benefits across most large enterprise environments for many other reasons.

Summary

Maybe i'm being unfair here to Microsoft with my vendor rants, we have had SIS functionality reducing storage costs for a while wihtout realising and have taken it for granted, I think more and more moving forward we will need to shift to more focus on alternate strategy using Archiving products more and more and be sensible about the lifecycle of storage management within email environments. Longer term it will be interesting to see results from people migrating to 2010 to see if they notice a dent in storage costs if they are using SAN and not the horrible dreaded DAS!

Wow, this is the first I've heard of SIS being removed and not replaced by something similar. I'm going to have to do some investigation into it myself now.

This is a massive oversight on Microsoft's part as IMHO although the arguement of disk now being cheaper is true the same arguement could be made in that CPU clock speeds are now much higher (increased core count, etc) for effectively the same money as in the past. I would much rather throw extra CPU resource (for the very little overhead I'm sure SIS actually adds) than that of adding extra disk any day of the week.

I wonder how all this plays into Exchange 2010's new found archival functionality - does it dedupe at this stage perhaps?

Also, wouldn't it be greener to burn a few extra watts of power on CPU cycles to keep SIS or similar as opposed to keeping additional disk spindles running to accomodate all this extra duplicated email data?

Thanks for an interesting post and I'd be interested to hear the thoughts of others on this.

Just a thought here. The advantages of SIS have been declining for some time now, as most organizations have gone from large databases with small mailboxes, to large mailboxes broken out into smaller databases for manageability reasons. SIS was great in Exchange 5.5, but the returns have diminished greatly. So in reality, the effects of dropping SIS aren't as harsh as you'd think.

As Anonymous mentioned the benefits of SIS have diminished greatly in the last couple Exchange releases. The dedup only occurs within a single database and best practice says those shouldn't exceed 200GB (100GB if CCR is used), with mailbox size limits growing in many companies, usually 100-200 mbxs per database (perhaps as much as 500 users in rare cases).

Also Exchange 2010 now compresses all attachments before storing them in the information store; this change is likely to have a greater impact than SIS did for 2007 deployments.

@Simon Glad you agree :), I maybe jumping in feet first with this to critise (whats new), but generally I get concerned when Microsoft mention DAS as the answer to reducing my storage costs. They tried this with 2007 and it just dosnt cut it in Enteprise environments.

The new Archiving again is a bit lame from what i've read, it dosnt again deduplicate messages like Evault or Quest Archive manager would, its basically a secondary mailbox with policy. However the target customer for this is the SMB sector I believe.

@Andrew and @Anonymous, you make a valid point, and again it's all about how you change the way you think about Exchange. However even with a single Database of say 200 users, everyone receiving email within that DB could still amount to Gigs of data not MB....

On compression and relevant benefits i am a bit skeptical about this, it must be file type dependant and only capable of benefiting compression with certain file types?

Here's the thing. In most of the Exchange implementations I've seen since Ex5.5, the gating factor that determines the amount of disk that Exchange uses hasn't been the quantity of email but the level of IO. What I suspect has happened here is that MS has run the numbers and determined that the space savings resulting from SIS don't outweigh the IO cost of additional look-ups in secondary indexes and tables.

The thing is, the example of the company-wide mail with a 5MB attach isn't the reality of the situation with SIS. In practice, the vast majority of attachments go to a comparatively small number of people who may (or likely may not) be on the same server. Where they aren't on the same server (or storage group - I can't remember), SIS will be broken from the get-go. For years, MS have been telling people not to expect too great an effect from SIS due to the impact of things like mailbox moves and inter-generation migrations (both of which break SIS).

Ultimately, anything that reduces the IO load for Exchange is probably going to end up reducing the number of disks you use for it. MS likely have the evidence and (I would imagine) have done the maths. I'd give them the benefit of the doubt on this one.

To clarify, I assume that in your examples, DAS is being used to mean dedicated, direct attached internal disks vs. an external shared direct attached storage via SAS (or iSCSI or FC for that matter) type of connection?

If that is the case, then things make much more sense.

However, not all DAS is internal and dedicated as is a common perception. There are also external shared (e.g. SAS or FC or iSCSI) direct attached storage (e.g. no switches or SAN per say) RAID solutions with snapshots and other features that are being deployed for exchange (See MSFT ESRP results for some examples), vmware and other applications or workloads.

I agree certain Directly attached fibre arrays like MSA's offer snapshots etc but they usually only scale to single figure server environments, which from a management perspective is still a large constraint on resource to manage.

Thanks for the clarification and concur about the confusion around DAS being thought of as internal or dedicated. You bring up an interesting point about the class of shared external arrays such as the HP MSA which can be attached to servers via FC, iSCSI or SAS. What I find interesting is that yes, compared to an iSCSI or FC/FCoE or even NAS access, the shared external direct attached via SAS arrays are limited in terms of host connectivity.

However that too is changing with table stakes currently at about two dual attached servers (assuming a dual controller array config), or 4 to 8 single attached servers (Dejavu or early FC and multi-init pSCSI?). An example of how things are changing is that some vendors including HP are supporting SAS switches to boost the number of dual/redundant attached servers to the arrays. Now that could beg the question of if that then qualifies switched SAS as a SAN however let’s leave that for the SAN police to debate.

Disclosure: I have no affiliation with HP, Im simply a fan of and using for the applicable scenario including among others SAS as well as iSCSI, FC, FCoE and NAS as a means of accessing and using technologies in support of resilient, scalable and flexible data infrastructures ;)...

Post a Comment

About Me

I am a Technical Architect from the UK based with an Airline. I specialise in all key areas of Infrastructure ranging from Virtualisation, storage, servers through to other areas such as DR and Continuity.
Currently also a VCP in ESX 3/4, and have also attended the VI3 DSA course.