UK retailing giant and mainframe user Sainsbury's is migrating away from tape to Shoden-supplied Data Domain disk backup systems.
It has rejected a move to Oracle's current StreamLine automated tape library and is moving to disk instead.
Shoden is the UK division of a South African systems integrator which resells Hitachi …

logical elimination

The title is required, and must contain letters and/or digits.

Considerably more reliable disks in RAID arrays, yes.

If a Tape fails you don't usually know until you come to do a restore, whereas RAID arrays can be monitored and if a disk fails or develops a fault it can be replaced without any loss of data integrity.

Costing kit in the real world

This piece tells up that the decision was primarily driven by cost (and also by the ability to lower risk, due to improved media reliability).

The problem with a simplistic, accountant-driven approach to data centres is that they spread the cost of stuff in a rather impractical way. So if your datacentre costs £1M to build and houses 500 servers, then the cost per server is £2000. However this conceals far more than it tells us about the true costs of expansion - or consolidation. Given that regime, you could be forgiven for using that number in a cost case and saying to your boss "Therefore if we can consolidate 20 servers, we'll save $40k in datacentre costs.". When in fact the actual saving (measured in money spent) is zero.

It also masks the real cost of adding new servers - presuming that if you need to add another one, the walls of the datacentre can somehow be pushed out a little, to make room for it. In real life, most new servers will add exactly nothing to the actual datacentre costs - until all the floorspace is used up. After that, the very next server will cost you another £1 million, as you would (theoretically) need to build a new datacentre to house it, since the old one would be full.

Costing everything in the real world

Actually you describe the non-accountant approach, as accountants (and more business minded IT professionals) are aware of the concept of marginal cost.

Graphing server costs against DC capacity would demonstrate the rather obvious spike in the cost of the server that exceeds the datacentre space (the marginal cost will equal the server + another datacentre, or perhaps more 'real world', the cost to host it elsewhere). It's high marginal costs like this that are front and centre when decisions on IT strategy are made.

Also, it's odd that you focus on what are long term capital costs that tend to be depreciated over a long period (whatever the useful life of a DC is), rather than the costs that are more directly affected by the server population, like power and server hardware maintenance. Even a DC's equipment (a/c, fire suppression, power generation/backup etc) will usually last 5-7 years, so through a server refresh.

Drops HD

Err...

How much data is backed up?

What is the retention period?

Is there offsite replication/how is it done?

Did they factor in power and cooling into the costs?

Is there intelligent spin down of disks?

If you're backing up a few terra a night disk is all well and good (assuming you accept the increased electricity and cooling costs) but when you need to backup hundreds of terra a night, or even peta disk is rerely the option.

Case in point: At home my backup datastore is 2TB, disk is fine. At work we backup about 3PB a night, disk is out of the question, even a disk stage. There is a tipping point where costs of disk can't be handled, even by the largest corporations and that's before you take into account energy useage reduction policies.

Agreed (to a point)

Agreed that there are volumes where tape wins out for a full copy. However, one thing that disk does allow for that is more difficult with tape, and that is incremental backup. In effect, that's what de-dupe does. It looks like a full backup but is, in fact, an incremental in terms of space occupied. It would be an exceptional organisation that turned over 3PB of updates a day (an average of almost 35GB per second of update).

However, there's a penalty to de-dupe which is often not mentioned, and that is the very process of de-dupe gets rid of an important level of redundancy. If you take 10 full backups, you no longer have 10 independent copies. All the copies will share the same de-duped block (using "block" as a loose term for the unit of de-duping) unless multiple real copies are held. That means putting a huge amount of trust in the resilience of the backup storage solution and it's complex mapping database. All those devices have a lot of resilience built in for those functions, but you'd better hope that it has been extraordinarily well tested or a corruption could wreck the entire backup store.

Also, in order to off-site, you need either to destage to tape or to use storage replication of the backup store over some fairly meaty network connections.

The other thing...

The other thing that noone mentions with dedupe is that you are totally reliant on the metadata database where the checksums of the deduped data are stored. If your database corrupts you're knackered, if you don't notice that corruption you're really, really knackered.

I speak as someone who has been affected by a dedupe product (no names mentioned) corrupting it's databse. As it happens the replica didn't corrupt, but could have and that would have lost everything.

@AC 23:18

We have a pretty much 24/7 backup window with only a few hours of time reserved for maintenance per day. Disk snapshots of data filesystems are mounted onto dedicated mount servers, so that systems aren't loaded and you can backup pretty much when you want. OSes can be backed up through the day with smaller data filesystems backed up over IP in the evening. There are multiple datacentres Europe wide (we are global, but the 3PB is just in Europe) and yes, we have hundreds of tape drives and a really huge amount of tapes.

So, AC 2318, you may want to not dive in with a "so and so is full of shit" when you clearly don't understand just how big some large, data orientated companies can be.

Hardware based dedup technologies are on their way out

Hardware based dedup technologies like EMC Data Domain / IBM ProtecTIER & Quantum StorNext and are on their way out.

You already have backup software including deduplication into the client and server hitting the market.

For example with IBM Tivoli 6.2 you can now tell the clients to deduplicate the backup data they send to the main server. It might not get the same compression rates as the hardware solutions, but you can use cheaper storage arrays on the backend.

If you compare a DataDomain array to a NetApp filer (with snapmirror and 2tb SATA disks). Even taking into account compression you will still see a big difference in costs.

Based on what?

Symantec, Commvault, IBM Tivoli, EMC Avamar and a few others have host based dedupe, yes, that relies on the hosts' processors to run the algorithm. In typically underworked physical environments, extra CPU needs aren't a big deal, but having a ton of VMs suddenly need to run dedupe algorithms isn't a good idea. Appliance-based dedupe will have its place for some time to come, even if it's just an appliance loaded with Sym or CV.

Just curious, why would you compare a DD appliance to a NetApp filer? The former is only a dedupe target and appliance; the latter is a storage array for use as a NAS/SAN. There are far cheaper ways to buy dedupe appliances than by buying a NetApp.

It'll end in tears

3 PB is not fiction

Why would you think that 3 PB a night is fiction? As a consultant, I have personally worked with multiple companies that back up that kind of data. They usually have 100s of large backup servers, and thousands of tape drives to make all that happen.