To decide on backup strategies, start with your data recovery requirements and your data recovery strategy. Each type of data recovery will require that you take certain types of backup.

Failures can run the gamut from user error, datafile block corruption and media failure to situations like the complete loss of a data center. How quickly you can resume normal operation of your database is a function of what kinds of restore and recovery techniques you include in your planning. Each restore and recovery technique will impose requirements on your backup strategy, including which features of the Oracle database you use to take, store and manage your backups.

When thinking about recovery strategies, ask yourself questions like these:

If a logic error in an application or a user error caused the loss of important data from one or several tables or tablespaces, how could you recover that data, and what would happen to database updates since the error? Could you determine the cause of the error, to prevent it from happening again? As described in "Planning a Response to User Error: Point-in-Time Recovery and Flashback Features", techniques available to you include point-in-time recovery of the whole database or one or more tablespaces, importing data from earlier logical exports with one of the data import utilities, and using the Oracle database's flashback features.

If the instance alert log indicates that one or more tables contains corrupt blocks, how can you repair the corruption? Does the tablespace have to remain available during the repair? As described in "Planning a Response to Datafile Block Corruption: Block Media Recovery", the RMAN BLOCKRECOVER command can help you in this situation. Also, troubleshoot recovery with the SQL*Plus RECOVER...TEST command.

If the entire data center is destroyed, can you perform disaster recovery? Assume that all you have is an archive tape containing backups. How would you recover the database? How long would that recovery take?

If you were not available to recover your database, could someone else recover it in your absence? Are your recovery procedures sufficiently automated and documented?

With these needs in mind, decide how you can take advantage of features related to backup and recovery, and look at how each feature meets some requirement of your backup strategy. For example:

Using Recovery Manager simplifies most backup and recovery operations compared to user-managed backup and recovery. It automates management of most backup files, including the deletion of backups and archived redo logs from disk or tape when no longer needed to meet recovery goals. It provides detailed reporting on backup activities, can verify that your available backups can be used to recover your database. Finally, RMAN makes possible many recovery techniques not available if you are using user-managed backup and recovery, such as incremental backups.

Flashback Database will help you restore a database to a previous time much faster than media recovery. However, you must decide in advance to keep flashback logs, and keeping flashback logs requires that you configure a flash recovery area.

Block media recovery may be better than datafile media recovery if availability is critical. While block media recovery is possible even if you do not base your backup and recovery strategy on RMAN, RMAN-based block media recovery can be performed more quickly and with less effort.

Once you decide which features to use in your recovery strategy, you can plan your backup strategy, answering the following questions, among others:

How and where will you store your recovery-related files? Will you use a flash recovery area? Will you use an ASM disk group? Will you store backups on tape or other offline storage, or only on disk?

At what intervals will you take scheduled backups? And what form of physical backups will you take in each situation?

What situations require you to take a database backup outside of the regular schedule? Sometimes you must take an unscheduled backup to ensure that you can recover your data, such as after an OPEN RESETLOGS or after changes to your database such as NOLOGGING operations that do not appear in the redo log. You may also have business requirements that require backups for auditing purposes or other reasons not related to database recovery.

How can you validate your backups, to ensure that you can recover your database when necessary?

How do you manage records of your backups?

Do you have detailed recovery plans that cover each type of failure? How do your DBAs can execute these plans in a crisis? Can scripts be written to automate execution of these plans in a crisis?

Can you apply Oracle database availability technologies, such as Data Guard or Real Application Clusters, to improve availability during a database failure? How does using these availability technologies affect your backup and recovery strategy?

These are of course only a few of the considerations you should take into account. Available resources (hardware, media, staff, budget, and so on) will also be factors in your decision.

Your data recovery strategy should include responses to any number of database failure scenarios. The key to an effective, efficient strategy is envisioning failure modes, matching Oracle database recovery techniques and tools to the failure modes in which they are useful, and then making sure you incorporate the necessary backup types to support those recovery techniques.

To help match failure modes to recovery techniques that can help resolve them, refer to the following sections:

Planning a Response to User Error: Point-in-Time Recovery and Flashback Features

Your backup and recovery strategy should enable you to handle situations in which a user or application makes unwanted changes to database data, such as deleting the contents of a table or making incorrect updates during a batch run. The goal in such a case will be to restore the affected parts of your database to their state before the user error.

Depending on the situation, your appropriate response will be one of the following:

If you have performed a logical backup by exporting the contents of the affected tables, sometimes you can import the data back into the table. This technique presumes that you are regularly exporting logical backups of your data, and that any changes between exports are unimportant.

You can perform point-in-time recovery, bringing one tablespace or the whole database back to its state before the time of the error. In either case, you need backups from before the time of the error, plus the redo logs from the time of the backup to the time of the error.

Note:

Oracle's Flashback Technology provides faster and less disruptive alternatives to media recovery in many circumstances.

Oracle Flashback Database is a physical-level recovery mechanism similar to media recovery, but generally faster and not requiring the restore of data from backup.

Oracle Flashback Table and Oracle Flashback Drop work at the logical level, undoing unwanted changes to tables, including reversing the effects of DROP TABLE statements.

Oracle Flashback Query and Oracle Flashback Version Query are useful in viewing past contents of tables and investigating how and when logical corruptions affected your database.

Information about these features is collected in Oracle Database Backup and Recovery Advanced User's Guide. This document will allude to such features where they can be helpful and provide pointers for more information. Familiarize yourself with these features before planning your backup and recovery strategy, because you may find that they can be quite valuable and require limited advanced planning.

Planning a Response to Media Failure: Restore and Media Recovery

A media failure occurs when a problem external to the database prevents Oracle from reading from or writing to a file during database operations. Typical media failures include physical failures, such as head crashes, and the overwriting, deletion or corruption of a database file. Media failures are less common than user or application errors, but your backup and recovery strategy should prepare for them.

The type of media failure determines the recovery technique to use. For example, the strategy you use to recover from a corrupted datafile is different from the strategy for recovering from the loss of the control file.

Example: Online Redo Log Recovery

The method of recovery from loss of all members of an online log group depends on a number of factors, such as:

The state of the database (open, crashed, closed consistently, and so on)

Whether the lost redo log group was current

Whether the lost redo log group was archived

For example:

If you lose the current group, and the database is not closed consistently (either it is open, or it has crashed), then you will have to restore an old backup and perform point-in-time recovery, followed by OPENRESETLOGS. You will lose all transactions that were in the lost log. You should take a new full database backup immediately after the OPEN RESETLOGS. Backups from before the OPEN RESETLOGS will not be recoverable because of the lost log.

If you lose the current redo log group, and if the database is closed consistently, then you can perform OPENRESETLOGS with no transaction loss. However, you should take a new full database backup. Backups from before the OPEN RESETLOGS will not be recoverable because of the lost log.

If you lose a noncurrent redo log group, then you can use the ALTERDATABASECLEARLOGFILE statement to re-create all members in the group. No transactions are lost. If the lost redo log group was archived before it was lost, then nothing further is required. Otherwise, you should immediately take a new full backup of your database. Backups from before the log was lost will not be recoverable because of the lost log.

If a small number of blocks within one or more datafiles are corrupt, you can perform block media recovery instead of restoring the datafiles from backup and performing complete media recovery of those files. The Recovery Manager BLOCKRECOVER command can be used to restore and recover specified data blocks while the database is open and the corrupted datafile is online.

Your plans for data recovery strategies are the basis of your plans for backup strategy. This discussion describes general guidelines that can help you decide when to perform database backups, which parts of a database you should back up, what tools Oracle provides for those backups, and how to configure your database to improve its robustness and make backup and recovery easier. Of course, the specifics of your strategy must balance the needs of your restore strategy with questions of cost, resources, personnel and other factors.

Protecting Your Redundancy Set

The set of files needed to recover an Oracle database from the failure of any of its files--a datafile, control file, or online redo log--is called the redundancy set. The redundancy set should contain:

The last backup of the control file and all the datafiles

All archived redo logs generated after the last backup was taken

Duplicates of the online redo log files, generated by Oracle database multiplexing, operating system mirroring, or both

Duplicates of the current control file, generated by Oracle database multiplexing, operating system mirroring, or both

Copies of configuration files such as the server parameter file, tnsnames.ora, and listener.ora

The first rule of protecting your redundancy set is:

The set of disks or other media that contain the redundancy set for your database should be separate from the disks that contain the datafiles, online redo logs, and control files.

This practice ensures that the failure of a disk that contains a datafile does not also cause the loss of the backups or redo logs needed to recover the datafile. Consequently, a minimal production-level database requires at least two disk drives: one to hold the files in the redundancy set and one to hold the database files. Ideally, separate the redundancy set from the primary files in every way possible: on separate volumes, separate file systems, and separate RAID devices. Keeping the redundancy set separate from the primary files ensures that you will not lose committed transactions in a disk failure.

The simplest way to manage your redundancy set is to use a flash recovery area, on a separate device from the working set files. All recovery-related files will be stored in a single location on disk, disk space usage is managed automatically, backups required to meet your data recovery requirements are never deleted from disk while they are still needed, and recovery time is minimized without compromising the completeness of the redundancy set.

Whether or not you use a flash recovery area, Oracle Corporation recommends following these guidelines:

Multiplex the online redo log files and current control file at the database level. (For instance, configure the database to write its online logs to two or more destinations, so that each write is a separate operation carried out by the database, rather than by operating system-level or hardware-level redundancy.) If you multiplex at the database level, then an I/O failure or lost write should only corrupt one of the copies.

Ideally, the multiplexed files should be on different disks mounted under different disk controllers. The flash recovery area is an excellent location for one copy of these files.

You can also mirror the online redo logs and current control file at the operating system or hardware level, but this is not a substitute for multiplexing at the database level.

If running in ARCHIVELOG mode, archive the redo logs to multiple locations, ideally on different disks. If you are using a flash recovery area, use it as one of the archiving locations.

Use operating system or hardware mirroring for the control file. All copies of the control file multiplexed at the database level must be available at all times, or the instance will crash. If you use operating system or hardware mirroring for your control file, your database can continue to operate even if one copy of the control file mirrored at the operating system level is unavailable due to a disk failure.

Use operating system or hardware mirroring for the primary datafiles if possible, to avoid having to perform media recovery for simple disk failures.

Keep at least one copy of the entire redundancy set--including the most recent backup--on disk. The flash recovery area is the ideal location for the redundancy set.

If the target database is stored on a RAID device, then store the redundancy set on a set of disks that are not in the same RAID device.

If you store the redundancy set on tape, then maintain at least two copies of the data to protect against the risk of tape failure. Also, if you have more than one copy of the same data, then consider keeping backups from different points in time. In this way, if one backup or split mirror was done when the database was corrupted, then you have an older backup when the database was not corrupted.

Deciding Between ARCHIVELOG and NOARCHIVELOG Mode

The redo logs of your database provide a complete record of changes to the datafiles of your database (with a few exceptions, such as direct path loads).

You can run your database in one of two modes: ARCHIVELOG mode or NOARCHIVELOG mode. In ARCHIVELOG mode, a used online redo log group must be copied to one or more archive destinations before it can be reused. Archiving the redo log preserves all transactions stored in that log, so that they can be used in recovery operations later. In NOARCHIVELOG mode, the online redo log groups are simply overwritten when the log is re-used. All information about transactions recorded in that redo log group is lost.

If you are running in NOARCHIVELOG mode and you must recover from damage to datafiles due to disk failure, you have two main options for recovery:

Drop all objects that have any extents located in the affected files, and then drop the files. The remainder of the database is intact, but all data in the affected files is lost.

Restore the entire database from the most recent backup, and lose all changes to the database since the backup. (Recovering changes since the backup would require performing media recovery, which uses the archived redo logs.)

Implications of Running in ARCHIVELOG Mode

For most applications, running in ARCHIVELOG mode is preferable to running in NOARCHIVELOG mode because you have more flexible recovery options after a data loss. There are, however, associated costs of running in ARCHIVELOG mode:

Space must be set aside for archiving destinations, locations on disk where the archived redo logs will be stored. These can become quite large in databases with large numbers of updates.

The stored archived redo logs must be managed. To limit the disk space used by archived redo logs, archived redo logs can be moved to tape for longer-term storage, and older logs no longer needed to meet your recoverability goals should be deleted. (RMAN can automate most of the management of archived redo logs, by recording the location and contents of all archived redo logs, making it easy to move archived logs to tape, and identifying and deleting redo logs no longer required to meet your recoverability objectives.)

Some performance overhead is associated with the background processes ARC0 through ARCn which copy filled online redo logs to the archiving destinations.

When performance requirements are extreme or disk space limitations are severe, it may be preferable to run in NOARCHIVELOG mode in spite of the restrictions imposed.

Deciding Whether to Use a Flash Recovery Area

It is recommended that you take advantage of the flash recovery area to store as many backup and recovery-related fileas as possible, including disk backups and archived redo logs.

Some features of Oracle database backup and recovery, such as Oracle Flashback Database, require the use of a flash recovery area. In such cases, you must create a flash recovery area, though you do not have to use it to store all recovery-related files.

Even when its use is not required, however, the flash recovery area offers a number of advantages over other on-disk backup storage methods. Backups moved to tape from the flash recovery area are retained on disk until space is needed for other required files, reducing the need to restore backups from tape. At the same time, obsolete files no longer needed to meet your recoverability goals and files backed up to tape become eligible for deletion and are deleted when space is needed, eliminating the need for DBA intervention to clear out old files.

Choosing a Backup Retention Policy

Your backup retention policy is the rule you set regarding which backups must be retained (whether on disk or other backup media) to meet your recovery and other requirements.

Backup retention policy can be based on redundancy or a recovery window. In a redundancy-based retention policy, you specify a number n such that you always keep at least n distinct backups of each file in your database. In a recovery window-based retention policy, you specify a time interval in the past (for example, one week, or one month) and keep all backups required to let you perform point-in-time recovery to any point during that window.

A backup no longer needed to satisfy the backup retention policy is said to be obsolete.

Implementing Backup Retention Policy with RMAN

RMAN automates the implementation of a backup retention policy, using the following commands:

CONFIGURE RETENTION POLICY command lets you set the retention policy that will apply to all of your database files by default.

REPORTOBSOLETE command lets you list backups currently on disk that are obsolete under the retention policy. You can also specify parameters to see which files would be obsolete under different retention policies.

DELETEOBSOLETE command deletes the files which REPORT OBSOLETE would list as obsolete.

CHANGE... KEEP lets you set a separate retention policy for specific backups, such as long-term backups kept for archival purposes. You can specify that a given backup must be kept until a future time, or even specify that a backup be kept forever. CHANGE... NOKEEP is used to let the retention policy apply to a backup previously protected by CHANGE... KEEP.

If you use a flash recovery area to store your backups, the database will delete obsolete backups automatically as disk space is needed for newer backups, archived logs and other files. For backups stored on disk outside a flash recovery area and for backups stored on tape, you should periodically run the DELETE OBSOLETE command to remove obsolete backups.

Recovery Window-Based Backup Retention Policy

A recovery window-based retention policy lets you guarantee that you can perform point-in-time recovery to any point in the past, up to a number of days that you specify. The earliest point in time to which you can recover your database under your retention policy is known as the point of recoverability. All backups required for recovery or point-in-time recovery back to that time will be retained.

Note that this will generally require that you keep backups older than the beginning of the recovery window. A point-in-time recovery to the beginning of the recovery window would require a restore from this backup, and then applying all changes between the backup time and the point of recoverability. For example, you might configure a recovery window of three days:

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;

If your last full database backup was six days ago, RMAN will keep the six-day-old backup, and all redo logs required to roll the database forward to the beginning of the recovery window three days ago, in addition to any backups and redo logs needed to recover the database to all points in time within the three day window.

A recovery window-based backup retention policy provides the most certain recoverability for your data. The disadvantage is that more careful disk space planning is required, since it may not be obvious how many backups of datafiles and archived logs must be retained to guarantee the recovery window.

Redundancy-Based Backup Retention Policy

A redundancy-based backup retention policy determines whether a backup is obsolete based on how many backups of a file are currently on disk. You might configure a redundancy level of 3:

RMAN> CONFIGURE RETENTION POLICY TO REDUNDANCY 3;

In this case, RMAN keeps three backups of each database file, and all redo logs required to recover all retained datafile backups to the current time. Any older backups will be considered obsolete.

Assume, for instance, that you make backups of a datafile every day, starting on a Monday. On Thursday, you make your fourth backup of the datafile, and the backup from Monday becomes obsolete because you have the backups from Tuesday, Wednesday and Thursday. On Friday, the backup from Tuesday becomes obsolete, because you have the backups from Wednesday, Thursday and Friday.

Archiving Older Backups

There are several reasons to keep older backups of datafiles and archived logs:

An older backup of datafiles and archived logs is necessary for performing point-in-time recovery to a time before your most recent backup.

If your most recent backup is corrupt, you can still recover your database using an older backup and the complete set of archived logs since that older backup.

You may want to keep a copy of the database for archival purposes.

To perform point-in-time recovery to a given target time earlier than your current point of recoverability, then you need a database backup that completed before the target time, as well as all of the archived logs created between the time the backup was started and the target time. For example, if you take full database backups starting at 1:00 AM on February 1 (at SCN 10000) and on February 14 (at SCN 20000), and if you decide on February 28 to use point-in-time recovery to bring your database to its state at 9:00AM February 7 (SCN 13500), then you must use the February 1 backup, plus all redo logs containing changes from between the beginning of the creation of the backup (SCN 10000) and 9:00AM February 7 (SCN 13500).

Note that point-in-time recovery to a time between backups is not an option for a database operating in NOARCHIVELOG mode. You can only restore your entire database from a consistent whole database backup, and re-open the database as of the time of that backup. You will lose all changes since the backup was taken.

Determining Backup Frequency

Frequent backups are essential for any recovery scheme. Base the frequency of backups on the rate or frequency of database changes such as:

If database updates are relatively infrequent, then you can make whole database backups infrequently and supplement them with incremental backups (which will be relatively small because few blocks have changed). The scenario in "Backup Scripts When Few Data Blocks Change" describes how to develop a backup strategy based on a single whole database backup.

Performing Backups Before and After You Make Structural Changes

There are times when you will need to take a backup of your database independent of your regular backup schedule. If you make any of the following structural changes, then perform a backup of the appropriate portion of your database immediately before and after completing the following changes:

Create or drop a tablespace.

Add or rename a datafile in an existing tablespace.

Add, rename, or drop an online redo log group or member.

The part of the database that you should back up depends on your archiving mode.

Mode

Action

ARCHIVELOG

Make a control file backup (using RMAN or using the SQL ALTERDATABASE statement with the BACKUPCONTROLFILE option) after a structural alteration. Of course, you can back up other parts of the database as well.

NOARCHIVELOG

Make a consistent whole database backup immediately after the modification.

Backing Up Frequently Used Tablespaces

If you run in ARCHIVELOG mode, then you can back up an individual tablespace or even a single datafile. You might want to do this for one or more tablespaces that are updated much more often than the rest of your database, as is sometimes the case for the SYSTEM tablespace and automatic undo tablespaces.

More frequent backups of heavily-used datafiles can shorten recovery times in some situations. You may have a database where most updates are restricted to a small set of tablespaces. If you take a full database backup each Sunday, then recovery from a media failure affecting the frequently updated tablespaces on Friday requires re-applying large amounts of redo. Daily backups of the frequently-updated tablespaces reduces the amount of redo to apply without requiring a daily full database backup.

Backing Up after NOLOGGING Operations

When a direct path load is performed to populate a database, no redo data is logged for those database changes. You cannot recover these changes after a restore from backup using conventional media recovery. Likewise, when tables and indexes are created as NOLOGGING, the database does not log redo data for these objects, which means that you cannot recover these objects from existing backups. Therefore, you should back up your datafiles after operations for which no redo data is logged.

Note:

You can use either a full backup of your datafiles or an incremental backup. Either one will capture all changed blocks, including blocks changed by unrecoverable operations.

Exporting Data for Added Protection and Flexibility

Oracle database import and export utilities are used to export database objects (tables, stored procedures, and so forth) from databases to be stored as files, and re-import objects from those files. An export provides a logical-level snapshot of the exported objects at the time of the export, as a binary file that can be imported back into the source database or some other database. Consider exporting portions or all of a database for supplemental protection and flexibility in a database's backup strategy.

While useful, database exports are not a substitute for whole database backups. They cannot provide the same complete recovery advantages of physical-level backups. For example, you cannot apply archived logs to logical backups in order to update lost changes.

Preventing the Backup of Online Redo Logs

Online redo logs, unlike archived logs, should never be backed up. The chief danger associated by having backups of online redo logs is that you may accidentally restore those backups without meaning to, and corrupt your database.

Online redo log backups are also not particularly useful, for the following reasons:

If your database is in ARCHIVELOG mode, then the archiver is already archiving the filled redo logs automatically.

If your database is in NOARCHIVELOG mode, then the only type of physical backups that you can perform are closed, consistent, whole database backups. The files in this type of backup are all consistent and do not need recovery, so the online logs are not useful after a restore from backup.

The best method for protecting the online logs against media failure is to multiplex them, with multiple log members in each group, on different disks attached to different disk controllers.

Note:

RMAN does not permit you to back up online redo logs. You must archive a redo log before backing it up.

Keeping Records of the Hardware and Software Configuration of the Server

During the stress of a recovery situation, it is important that you have all necessary information at your disposal. This is especially true if for some reason you need to contact Oracle Support because you run into a problem that you do not understand. You should have the following documentation about the hardware configuration:

The name, make, and model of the machine that hosts the database

The version and patch of the operating system

The number of disks and disk controllers

The disk capacity and free space

The names of all datafiles

The name and version of the media management software (if you use a third-party media manager)

You should also keep the following documentation about the software configuration:

The name of the database instance (SID)

The database identifier (DBID)

The version and patch release of the Oracle database server

The version and patch release of the networking software

The method (RMAN or user-managed) and frequency of database backups

The method of restore and recovery (RMAN or user-managed)

You should keep this information both in electronic and hardcopy form. For example, if you save this information in a text file on the network or in an email message, then if the entire system goes down, you may not have access to this data.

It is especially important to keep a record of the DBID. If you have to restore and recover your database including the loss of the SPFILE and control file, you will need the DBID during the recovery process. See "Basic Database Restore and Recovery Scenarios" for details on how the DBID is used during recovery.

Practice backup and recovery techniques in a test environment before and after you move to a production system. In this way, you can measure the thoroughness of your strategies and minimize problems before they occur in a real situation. Performing test recoveries regularly ensures that your archiving, backup, and recovery procedures work. It also helps you stay familiar with recovery procedures, so that you are less likely to make a mistake in a crisis.

If you use RMAN, then run the DUPLICATE command to create a test database using backups of your production database. If you perform user-managed backup and recovery, then you can either create a new database, a standby database, or a copy of an existing database by using a combination of operating system and SQL*Plus commands.

Validating RMAN Backups: BACKUP VALIDATE and RESTORE VALIDATE

The RMAN BACKUP VALIDATE and RESTORE VALIDATE commands can be a useful part of your recovery plan testing. BACKUP VALIDATE reads all of the specified files but does not produce any output files. All of the data blocks in the input files are validated, exactly as they are when a real backup takes place. RESTORE VALIDATE reads all of the backup files that would be needed to restore the specified objects, but the objects are not actually restored to disk. All of the data blocks in the backup files are validated, exactly as they are when a real restore takes place. Just as in a real restore, RESTORE VALIDATE automatically chooses which backup files to restore from. For example, the command RESTORE VALIDATE DATABASE ensures that, for every file in the database, a valid backup exists, can be read, and contains valid data.