Inclusions versus Exclusions: Choosing the Best Method for Backup and Data Collection

Whatever software you choose for backing up files, you need to be organized. Do you really need myriad copies of the Trash folder or *.bak files, which consume backup time, bandwidth, and storage? Probably not. So herein are useful guidelines for designing a sensible business backup strategy, in order to ensure you keep all the right data securely… but not the junk.

Once you get a backup solution up and running (and we like to think you chose Druva inSync, but this advice applies to everything from diskette backup to server backup), a first step for any would-be sysadmin or IT manager is to define the problem you’re trying to solve. Is the data in the form of key business files or, at the other extreme, legal retention. There is no “one size fits all” methodology; the “right answer” is a balance of use, need, and impact.

Most of our clients work out their own balanced method – and then they ask us, “Are we the only ones doing it this way?” Perhaps or perhaps not, but the only thing that matters is that it works for you.

Are you in or out?

There are two methods primarily used to creating a backup set, based on inclusion or exclusion.

With the inclusion method, you choose to pick which files to back up by identifying the file type. The premise is that you back up certain file types that you identify as business-critical, and you only back up these file types.

In contrast, exclusion instantiates the attitude that we should include everything in the backup set by default, and then exclude the file types we don’t want. This allows you to capture everything in a user’s profile except for, say, files that you can recreate or are considered non-critical.

Deciding which of these technology paths to take depends on your data backup goal. Is it for disaster recovery only? Or are you looking at data backup for other purposes, such as eDiscovery (where, from a legal perspective, you have to collect and store more data).

Wait – Don’t we want to back up everything?

In some industries, the regulatory environment requires backups of everything to be kept forever. The mandates for data retention policies for some government organizations, for example, are “Life of the Republic.”

But this is not the case for most organizations. Truly, you may not have to capture every last file, nor should you aim to do so by default. Network and storage impacts need to be taken into account. Sure, you could back up every file on every hard disk, but even the fastest enterprise network has finite limits to speed, storage (at least in its cost per gigabyte kept), and the time in which to move data across the pipes.

Huh? Aren’t networks fast enough to handle this? Not necessarily. Every piece of data, either backup up or restored, has to travel across a LAN or WAN that was provisioned for a host of other services. Consuming bandwidth unnecessarily to perform backups of data that simply isn’t needed, won’t look good when you need to ask for more resources to accommodate the service. Plus, it makes sense for the storage team to maximize the efficiency of the space taken by the backup data. On a small scale these costs may be seem insignificant, but across a large enterprise there is a very real and very large resource cost required.

Leave these files on the cutting-room floor

The first category of files not to back up are those that are re-creatable via another method, such as Microsoft Outlook OST files (which are just copies of e-mail messages from the Exchange server, cached on the local computer). If you do need to restore a system, this file is recreated once the user re-connects to the Exchange server. If the data is already stored in another system that pulls it down on a rebuild, do you really need it? (Those suckers can be huge.)

Another example is system files: OS files or even application files that are exact copies across all machines and are on your installation media. If you have a core OS image that you drop on a new machine, why would you need a bare-metal backup?

Other candidates for exclusion:

.tmp, .bak, and other temporary files

cookies

clip art

Even if you’re administering the corporate rules that govern how end users’ systems are automatically backed up (one advantage to inSync, we like to think…), you should probably give end users the ability to unselect files from their backup set. For example, some Mac users may have critical files stored in a virtual Windows machine. But some Windows users who run Virtual Machines may do so only for test purposes, and those don’t need to be backed up. As the sysadmin, you shouldn’t make that decision for them.

There are client impacts of the exclusions you’re honor bound to consider. If you feel that image and music files aren’t business critical, how does this impact your users who may use their work computer as the primary way to store photos they took on their iPhones? Some clients have told us stories in which users wailed, “If you can restore only one file, please make sure it’s my personal PST file” or “I had all my kids’ birthday pictures on my machine and I can never get them back on.” Even if your organization has a draconian BYOD policy… be kind to these people. Nobody wants to see a user cry.

Don’t create these backup rules and then act as though they will stay this way forever. With data growth happening at exponential rates, the data inclusion/exclusion decisions you make today may not work ideally for you a few months from now. The rule sets should be reviewed as new file types are added; you may find you need to exclude an item that you used to include.

Nothing says you can – or should – make all these choices in a vacuum. Ideally, the issues we raise here help you ask your project team and users the right questions to determine the best scope of backup sets and possible consequences of these decisions for your organization and its workflow.

Found this article helpful? See our white paper, 8 Must-Have Features for Endpoint Backup, to learn which essential endpoint backup features save IT time and maximize user productivity while protecting corporate data.