Netbackup Storage Lifecycle Policy by my way

I don’t believe disk arrays. Even you have great RAID protection, replication, snapshots there is still something called firmware. And it can have a bug. Usually it has (read release notes of next version) but fortunately not mandatory.
And there can be a bug screwing parity and this can be really bad.

You need to protect your disk based backups. I will describe you 3 different options by using of NetBackup:

Multiple copies of backup job

Storage lifecycle policies

Own perl script using bpduplicate

As it happened on one my backup environment. The whole raidgroup gone.
O.K. Fine. We have a backup. Let’s get data back.
Ehm. These data was the BACKUP!
Few tenths of TBs of backups. Really bad situation.
Vendor recreated raidgroup, put a firmware patch and we prayed to not get any restore request from these data.
Fortunately there was just backups with retention about 1-2 weeks.
In few weeks…
…BUM. Again.
The same raidgroup with “parity error proof” patch.

The failure probability is something every data owner has to take into consideration and the “insurance” is called “backup”. But how to protect disk based backup itself? Just to duplicate data somewhere else.

To tapes. Or to other disk array as it’s easier.
I know, I know. If the array has a bug the other one (of the same vendor)
has it too. Probability (p) of failure will never be zero but p^2 (probability
of failure two arrays at the same time) is much better.

Usually you have not only one backup server in some locality. (Do you? Really?
What’s your SLA for backup/restore?) Then it’s easy to duplicate backups among two or more backup servers.

Let’s discuss some options oferred by Symantec NetBackup.
Assume two or three media servers mediaA, mediaB, mediaC.
They are using dedup pools. Either separate PureDisk servers, external
dedup appliances or (like my preferred) local MSDP pools.

Here are the options…

1. Multiple copies of backup job

If you have just two media servers in a location the situation can be as follows:

Then in every schedule of every policy you have to explicitly configure multiple copies.
As only STUs located on the same media server are allowed for multiple copies you have to use configuration as on the picture above. In the case the option “If this copy fails” is configured to “continue” you can have successful backup but you cannot be sure whether you have one or two copies.
In case of selected “fail all copies” you will be sure that every valid backup has two copies but you miss the resiliency against media server or storage pool outage.

To load balance jobs across media servers you have to do it yourself by selecting media server for every combination of policy and schedule.

What in case of 3 media servers?

Pretty interesting, isn’t it? But resource balancing is again up to you.

2. Usage of Storage lifecycle policies

O.K. What about to use of SLPs? They are designed for duplication. Let’s use it. Fine.
Assume next easy configuration.

Let’s configure SLP as follows:

1. Backup to STUG
2. Duplicate to STUG

What will happen? Some images will have both copies on the same storage pool. Huh? Yes. You never know which STU will be selected for duplication. It can be the same as was selected during initial backup. This is not the protection I wanted.

The result is my own perl script starting by policy schedule with frequency every 1 hour. You can play with backup window to not run duplication during nightly backups.
You need a database policy if you want script defined in backup selection to execute and not backup.
I’m using Informix policy as we don’t have any Informix.
The functionality of the script is as follows:

cut each such list to smaller ones to limit size of each duplication job (to have maximum X images or not more than Y GBs - X, Y configurable)

write all these prepared lists to “Bidfiles”

select destination STU according the images current ownership (media server)

run bpduplicate commands with “Bidfile” as parameter - all in parallel

wait until completion of all children

The last step assures that no next processing will start until all duplicationd will finish.
NBU scheduler will not start new job of the same policy+schedule.
Therefore no image from already running bpduplicate will not be processed again (as it may be not duplicated yet).