Are enterprises asking the right questions about data and its storage?
This may seem an unusual question – it should not be treated as an abstract issue. If they are not, or if appropriate questions produce unexpected answers, then there may exist distinct possibilities for substantial future savings as well as releasing IT …

COMMENTS

It is not just about numbers here

Taking your example, what use is a 3 month old copy of the database?

If you lose your database due to hardware failure or environmental problem (eg fire, flood, theft), you want to restore to the most recent copy of the data, and as quickly as possible. Ideally you would have a real-time offsite mirror of the system that takes over immediately.

If you lose data due to software issues such as data corruption, a failed update or security issues, then you want to roll back to the most recent copy before that problem arose, and hopefully it isn't going to take you three months to notice there is a problem.

The system described in this article doesn't have that many recent or real-time copies of the data, so it isn't actually very good, but you have lots of old copies that are pretty much useless other than as poor substitutes for newer versions.

Re: It is not just about numbers here

Old backups can be vital. A coding or user error that corrupts or deletes some of the data may not be noticed for quite some time - it might only be noticed when a year end routine was run. Being able to retrieve (with effort) the missing data can outweigh the costs of the backup regime.

When I was a system administrator, I tended to keep additional backups outside the normal cycle. One time a private 4 year old tape backup had the last remaing copy of a vital piece of source code.

Too many backups is expensive - too few is courting disaster.

If a computer system is being removed - always get a full backup before it goes - if you do not then you WILL regret it.

Re: It is not just about numbers here

"Old backups can be vital."

AMEN

Here's an example: In 1999 a NEAX61M switch was rebooted to allow y2k updates to take effect.

It failed at reboot, and the failure cascaded to 3 other switches, putting 200,000 phone lines out of action as well as a major international switch, resulting in subscribers throughout the country being unable to make long distance and international calls.

BACKUPS WERE CORRUPT. It turned out that something had scribbled over memory more than a year previously and what was backed up was copletely corrupt information.

Techs had to go back more than 2 years before they found a working backup - that was ok, the switch booted up - BUT 2 years means a LOT of changes around (people move houeses, etc etc etc), so the next step was to replay all database changes - which thankfully had been backed up separately.

It took more than a DAY before anyone could make any phone calls at all in Palmerston North - but there were still thousands of people who found they couldn't make calls.

It took 6 _WEEKS_ to replay those database updates. In that period various people had no dial tone, the wrong number, etc etc etc. A small ISP had more than half its lines out of action for most of that period.

Lesson1: Older backups can be vital

Lesson2: A backup is no fucking good if it hasn't been tested.

240 copies of small pieces of data such as mail folders is immaterial. 240 copies of a 1Tb filesystem is another matter.

I admin backups for about 1Pb of data. It's all about mintaining a balance between COSTS (and frankly tapes are the cheapest part of the whole shebang, so I don't really care if I use a bunch of extra LTO5s at 14 quid a pop), resiliance,complying with data retention laws, keeping archival copes and being able to run the backups in a reasonable time window. (Some areas are backed up 7-10 times per day, others are only touched once a week.)

A good backup system has a backing database so an admin can zero in on a given file at a given point in time in 2-3 minutes. That database is also a BRILLIANT intrusion/modification detection system - if any aspect of a file changes, its SHA512 signature changes and that means it gets backed up again.

It can tell you how many copies of any given file are backed up.

It also doesn't miss things if a file tree is moved - the number of backup systems which will detect and handle this correctly can be counted on one hand - 2 are free and the other 2 cost in excess of £30k.

The vast majority of "backup" systems out there are crap - and the ones being most heavily promoted commercially certainly fall into that camp.

Re: It is not just about numbers here

Palmerston North Re: It is not just about numbers here

Woahhh you had to go back 2 years ... a backup in good order would have been much more adequate.

1) Your company backed up corrupt data ... that can happen but going 2 years without checking its consistency ... that is bad. But at least you have learnt your "Lesson 2" which makes Lesson1 a moot point.

2) Before doing something very sensitive make sure you have the proper procedure and equipment at hand. Usually in HA things come in twos ... at the very least .. so test things before going commando on the production line.

I would add

3) Old backup may be completely useless when they have not been migrated. I've lived such an example where the servers went from x86 to Sparc and the application from C++ to Java ... good luck with finding a 20 year old server in working condition ... may prove extremely costly when data retentive lawyers kick in.

On that point lucky for you the backups agreed with the current firmware/hardware on the switch.

Re: It is not just about numbers here

1 year old backup is sufficient ... beyond that has it benifits but so has being paranoid its benifits... Google, MS, HP etc all declair their Profit/Loss/Revenues for that year and move on... so business criticallity is like 1.1 years...

So the usage case is: "this old data may be valuable if we screwed up and all our more recent backups were corrupt". Which in turn implies backups are not being test-restored; and/or the application data has hidden corruption which a cursory test restore does not uncover, and for which the only feasible solution is to pull old data from an old backup rather than regenerate the data from other sources.

You can argue that simply archiving tons of data is cheaper than spending staff time on testing restores. However, this is a risky strategy, as there's no guarantee that *any* of the backups are usable, or they may be so old as to have no business benefit.

I guess the ideal strategy would be: (1) do test restores at fixed intervals, with thorough testing of completeness and usability; (2) keep backups since the most recent test restore.

Answer use better backup software

If you want to use joke backup software then yes you are going to be storing loads and loads of copies of the data at silly multiples. Back in the real world you could simply select better backup software, specifically TSM and ONLY store a primary and secondary copy of every version of a file and radically reduce the data retention multiple and have "virtual" backups going back to however far you want. Then if the file is a database or similar where the file changes every day, you just dedupe it.

Whoever wrote this article is either not a storage administrator, or an uninformed twit who should not be let near any storage again.

Use of Backups

Thre's the obvious disaster scenario, total equipment loss at the primary site, in which case you'd want a complete copy of everything from the day before (assuming daily granularity).

Then there's a partial data loss, where one or more disks fail, but as you can't predict which data might be lost, you'd still like everything from yesterday to be available.

Then you get onto audit trails and archival storage. I'm sure most IT people want to be elsewhere (except the BOFH) when someone asks if it's possible to recover a file from last week/month because they've only just realised that's when they broke it. Then there's the need to haul up old versions of documents for other reasons - software people are hopefully already using version control and can reconstruct any version of a file, but other parts of the company are more likely to have just overwritten a previous version.

If it's needed for tax or accounting purposes then it should have been properly archived and so probably isn't taking up 240x the storage.

Start with the end in mind

This is what happens when you treat all data as equal. We run highly transactional DBs where the entire source data on a multi terabyte systems changes every 3 months. Holding db backups beyond that point has no purpose. On top of that we store the source, input, data so can restore most of the db by reloading source data. Any other data (usernames password tables and the like)are backed up and stored under a different retention scheme.

For me the key is to define your backup regime when creating, or design, the app,or install, with recovery in mind and be ruthless in removing data that has nofurther purpose.

Re: Start with the end in mind

In the UK, tax authorities can demand to see financial records several years old. If your database holds financial records then you might need to keep old copies for audit purposes even if they are of no other use to the business.

In one organisation that I worked for, one full backup each month was kept forever to provide the permanent audit capability. (This was specified as a requirement by our major customer.)

Re: Start with the end in mind

Yes, but for that you are probably better keeping a copy of the transaction report or similar in plain text or pdf format. It won't be of any use for restoring the data back to the system, but that is not the purpose of the data, it will be easier to view the data manually, even if you switch to a new computer system in 5 years time that stores things in a completely different format.

Bake it into the OS?

Maybe it's time to move away from a backup system making copies, and baking backup into the OS? E.g. The O.S. writes all new blocks to clean disk, so nothing is lost as the systems run. So now any point of time is recoverable? Then tools can be used to trim blocks out based on filters - e.g. Everything older than a certain date - or replicate sets of blocks to create images for specific dates to other locations. Sets of blocks can be indexed in order to aid finding data. Automatic thering can move old blocks from fast storage to slower/cloud based storage etc. Factor in depuplication mechanism and each unique block of data could easily be stored very few times depending on your personal level of paranoia?

Yes, Windows XP-E had the Enhanced Write Filter

This basically gave you manually-triggered points where the filesystem would only note changes at the block level instead of overwriting, so you could roll the entire partition back to any previous restore point.

Unfortunately this seems to have vanished from Windows 7 Embedded, which is most annoying.

Windows 7&8 do have the ability to maintain "shadow copies" of files, so you can roll any file back this way (if enabled!)

More user-friendly I suppose, but not so useful for embedded industrial.

De-duping the data within

I am unclear about how de-dupe work. But does it de-dupe purely on the differences in each file and keep a complete backup of files that are changed or only the difference in data within a file (So for instance you can have my_finances.xls multiple times but each file on differing times/dates contains slightly different data. Then de-dupe only keeps a record of those changes rather than the entire file each itself time)? That would make far more sense.

Anonymous because I don't want to be labelled clueless (even though I am).

Re: De-duping the data within

The only dumb question is the one you didn't ask when you weren't sure of the answer!

The answer is yes dependent upon the technology in question. Generally source side de-duplication will send only the changed blocks in data objects (files as well as databases) to the backup solution which must have an index to understand how to reconstruct to a point-in-time. should the local index be lost.

Storage side or backup server side tends to work at a global level, the storage also needs a mechanism to reconstruct a data object (or indeed several). I'd say database but as storage mechanisms generally don't use industry standard databases I'll avoid the DB word to avoid offending DBAs.