Data Domain Update

I’m not known for retractions and I’m not posting one. I did however check out the new DD boxes and the really big ones are far more capable than the old ones.

So, the techies (hats off for enduring a half hour with me) explained to me a few things:

The smallest block is 4K

The highest possible performance for the biggest box is 200MB/s

The biggest box can do a bit over 30TB raw

They scrub the disk continuously so it’s effectively defragged (see below for caveat) – they did admit performance totally sucks over time if you don’t do it (finally vindicated!)

This is good news, since it’s obviously far bigger than the old ones.

Some issues though (based on what the techies told me):

It scrubs the disk by virtue of NBU deleting the old images, then it knows what to get rid of. If your retentions are long then you will have performance problems. They suggested just dumping it all to tape and starting afresh once in a while. Which just confirms my suspicions on how the stuff truly works.

Each “controller” is really a separate box. The 16 controller limit does not mean it’s a larger appliance, it’s the limit of the management software.

Ergo, each controller can be a separate VTL or separate NFS mount. You cannot aggregate all your controllers in one large VTL. This sucks since if you need to do backups at 1GB/s or so, you’ll need at least 5-6 boxes, and you will have to define a separate library and drives per box. If you do NFS, you need to define 1-2 shares per box. This is a management nightmare. Make it all a single library! Copan has the same issue. I don’t know how they can do it though based on their architecture.

So, it looks to me like it may be a fit for some people, though I have no idea about the price points. If you want performance then you’ll need a ton of the boxes, and you’ll need to spend time configuring them. If 10 maxed-out boxes cost the same (or, worse, more) than a big EMC DL4400 (that can do 2.2GB/s) then it’s not an easy sell. Especially since EMC will be adding dedupe to their VTL – plus, you won’t have to define a bunch of separate libraries. Will EMC’s dedupe be similar? No idea, but if it doesn’t impact performance then it’s pretty compelling.

Never said using NBU to prune the box is bad – it’s the normal way it would work for B2D or B2T anyway.

The inescapable fact is that you can’t have very long retentions or performance suffers, since scrubbing doesn’t happen and there’s just more stuff to keep track of.

Moving the data from Data Domain (or indeed ANY disk or VTL) to tape is simple – just use NBU to do it using the built-in mechanisms. Plain Vault is best – disk backups with staging can be weird with dedup since it relies on reported available capacity.

As an aside, using disk staging can cause performance issues with NBU < 6 due to concurrency issues with staging. D

One of the choices in de-duplication is does the box do de-duplication in real time or after it has written to the box. We to have been playing with the DD boxes and they de-dupe on the fly compared to other vendors. Does have an impact on how much disk space that will be needed.

I’ll put in a biased chit here. Check out the updated PureDisk (PD) solution from Symantec. With the dedupe engine in the media server. You increase top-end throughput by adding media servers (1GB/sec can be done with 2 media servers) and the data flows into one common pool, rather than islands of dedupe storage. And with PD – you have on Disk Storage Unit – one pool. Done. Finally, because we dedupe at a higher point in the data path, you can send data over IP to the PD nodes. Tell me again, why you’d want a VTL?