This blog is by a long-time Oracle storage professional who has history with both NetApp and EMC.

May 14, 2010

Bruce Clarke, my good friend and former boss, and I have have been collaborating recently on RMAN backup to Data Domain deduplication storage arrays. In the process I have learned from Bruce (who I gratefully acknowledge for the technical content of this post, including the scripts shown below) about a very valuable approach to RMAN backup.

Most DBAs do full RMAN backups of all databases which are small enough to fit into a backup window. Full backups have several advantages:

They are very simple to run.

They optimize the restore operation, as no redundant blocks need to be applied.

They reduce dependencies in terms of the number of backup pieces that need to be cataloged and maintained in order to do a given restore, thereby reducing risk.

However, fulls have two strong disadvantages:

They waste space since the entire database's content is stored in each full backup.

They are time-consuming to run, because all active blocks in the database must be transferred to the backup target.

The first disadvantage is neatly avoided by the use of a deduplication array, such as Data Domain. My recent testing indicates that this is certainly true. More on this in a later post.

The second disadvantage is unavoidable at present. (Future developments in the area of source-based deduplication on the Data Domain side may mitigate this later on, but I will ignore that for now.) For most DBAs, the trade-off is acceptable as long as the full backup can fit into the available backup window. This is somewhat of a moving target as well, since recent developments in the area of storage networking (such as 10 GbE) have made the bandwidth increase, thereby increasing the size of the database that can fit into a backup window.

Unfortunately, the size of the typical database is also exploding as we all know. Thus, the issue remains: For a very significant number of large databases, the full backup cannot fit into the backup window. For these databases, the DBA has no real choice: He or she must use RMAN incremental backup.

Incremental backup (absent incremental update, which I will discuss later) provides the DBA with the option to only back up blocks which have been updated since the last backup. This provides a dramatic savings in terms of backup time, but with a significant manageability cost:

The number of backup pieces which must be cataloged and maintained increases significantly, making the dependencies greater, and thus increases the risk of a failed restore.

The restore time increases, as blocks must be restored multiple times, as the same block may be updated many times and thus stored in separate incremental backups.

Because of these issues, Oracle introduced incremental update for RMAN with 10g. There is an excellent article on the way this feature works in the Oracle documentation on OTN. Effectively, incremental update applies the incremental backup to the previous full backup. This neatly solves all of the issues of RMAN backup, except one: Instead of having a set of full backups, you end up with a single rolling full backup. In other words, once you apply the incremental backup to the previous full backup, you no longer have that previous full backup. Instead, you have a new full backup as of the point in time when you took the incremental backup.

This issue can be solved, obviously, by making a copy of the full backup prior to the incremental update operation. However, making this copy is time consuming, causes I/O, and takes up space.

Enter Bruce's solution: By using the Data Domain fast copy feature, you can make a copy of the previous full backup very quickly, and with no additional space. Further, the blocks which will be applied to the full backup copy have already been stored on the Data Domain in the incremental backup. Thus, they take up very little space as well. The end result is tantamount to RMAN backup nirvana:

You can take your backups in a fraction of the time it would require to do a full backup.

At the end of the process, you have a set of full backups for each time you run your RMAN backup.

All backups are fully deduplicated, thus saving you lots of space.

Restore operations are optimized as there are no dependencies other than a single full backup, and thus no redundant blocks to apply.

The scripts to accomplish this are below. The caveats all apply: This is provided as is with no warranty or support obligations. You are on your own in terms of all that. However, the solution does work, and you should definitely give it a try. Also, note that this is Data Domain copyrighted material. However, I have received permission from Bruce Clarke, the author of this script, to publish it here.

Example script for weekly backups

This was created and tested on Sun Solaris so slight changes for other platforms might be required.

Interpreting the results

Below are some space reporting results from the Data Domain system used in the development of the script shown above. The first output was taken before the script was invoked with the “incremental” option and the second after it completed. Please note that the database was open but undergoing little change, so a significant amount of the incremental backup storage was consumed in backing up the handful of archive logfiles that had been created. Nonetheless, the report shows that 4.5GB of additional data was introduced to the Data Domain but only an additional 16MB of space has been consumed.

This is an interesting idea, and I've been testing it on dumb storage while our new Data Domain comes online. One issue I've been running into is that rman recovers the fastcopy (a physical copy, in my setup) rather than the original copy. In other words, I'm seeing the following:

1. I run the initial backup incremental level 1 for recover of copy... and rman creates the first set of datafile copies in destination X with tag Y
2. I copy that set of datafile copies from X to destination Z
3. I catalog the files in destination Z
4. I run backup incremental level 1 for recover of copy as in step 1.
5. I run recover copy of database with tag Y and notice RMAN recovers the set of files in destination Z, rather than those in destination X. In effect, it recovers the set of datafiles I want to preserve.

Not sure how to get around this; I suspect the problem is that the files in the step 3 copy are also tagged Y (rman catalog start with doesn't seem to allow one to change the tags). Any thoughts (other than falling back to user-managed backups)?

Thanks for useful information. I have a question. We have set up "two" NFS mount points off basically one identical dd storage to have more visibility. One, /proj/fra, is for Oracle FRA which is fastcopy source and the other, /proj/fradumps/, is for fastcopy destination. Oracle sends backups to FRA subdirectories which are /proj/fra/databasename/backupset/DATE, datafile, and etc. Basically I intend to dump daily backups from there to /proj/framdumps/database/backupset/DATE. By the way I figured that fastcopy command requires dd relative path instead of OS path for source and destination. What whould be the source & destination path for fastcopy command in my case?

I had the same issue as Rob in that RMAN merged into backups in Z destination instead of X destination. My workaround was NOT to catalog Z backups as they are needed only in case of recovery, not during backup. My other issue was about retention policy. My company policy is to keep backups for 14 days. RMAN policy other than 'redundancy 1' RMAN keeps all the backups since database creation with merged incremental backup. To resolve it I had to put "until time 'sysdate -x'" for 'recover copy' command so that RMAN marks backups older than 14 days as obsolete. This makes the whole fastcopy story unnecessary in my opinion.

This code implies that your most recent full is your last FULL backup and not the last full that you have recovered using the incremental merge. We should always update the most_recent_full with the recent incremental merge full that we have just recovered, so that the next incremental will be applied to this and NOT to the first FULL you have taken.

2. The other thing is about the discussion above about cataloging the backups and the way the incremental is applied to merge it to a full. Not cataloging your rman backups, I believe is an ugly solution. To be able to catalog your backups, you can do the following.

Make two FASTcopies of your recent full. One with a DBNAME-retain.date extension and the other with a DBNAME-nextdaycopy.date extension. Catalog DBNAME-retain.date first and then the DBNAME-nextdaycopy.date. The order here is very important. catalog the -retain copy first and then -nextdaycopy. Now, update your most_recent_file with the location of "DBNAME-nextdaycopy-date". Please realize that the tag for both these copies is still the same. Now that you have two full's cataloged in your rman catalog, the next time you apply the incremental, it will be applied to the last cataloged full backup, which means DBNAME-nextdaycopy. This way, you still have your DBNAME-retain.date in your catalog.

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.