Recovering from database corruption during switch version to 8.5(1) SU3

From DocWiki

Recovering from database corruption during switch version to 8.5(1) SU3

Problem Summary

There have been a lot of customer cases due to database corruption happening during switch version. The root cause is that customer usuallyfeels that switch version is not progressing and restarts the CCX server in the middle of a switch version. This usually results in corruption of CCX database tables which subsequently require a lot of time and effort on the part of TAC and DE to recover.

To address this issue, changes have been in the switch version code in 8.5(1) SU3 to detect this corruption during switch version and to aid in easy recovery. Traditionally during switch version, a database backup is taken as the first step in the database switch version script. The scenario where the database gets corrupted during switch version is as follows :1. Customer initiates first switch version. A good database backup is taken in the db switch version script. Customer subsequently restarts the box before switch version completes.2. At this point the server comes up with a corrupt database due to the restart in step 1.3. Customer initiates second switch version. A new database backup is taken in the db switch version script which overwrites the good backup taken in Step 1 and rules out any chance of recovery using the good database backup.

As part of the new changes introduced in 8.5(1) SU3, the CCX database is first checked for corruption before a database backup is taken in the dbswitch version script. If the database is found corrupted, the switch version is aborted without taking a new backup and hence the original gooddatabase backup taken in Step 1 is protected from being overwritten..

Error Message

If customer has rebooted during switch version and the database has become corrupted, subsequent switch versions will not be allowed togo through until the database has been recovered from the good backup.When a switch version fails due to database corruption, it can be detected from reviewing the install logs (/var/log/install/uccx-install.log)for the following message :

Cisco Unified CCX DB appears to be corrupt.Please use the CLI command \"utils uccx switch-version [db-check | db-recover]\" to recover the CCX DB before retrying the switch versionIf this CLI command is not available in your release of CCX software, please contact Cisco Technical Assistance. for this problem...

Possible Cause

Root cause of this issue is customer restarting the CCX server while the switch version is in progress resulting in a corruption of the database.

Recommended Action

A CLI command has been introduced for recovering the database in this scenario but it can be used only in future when upgrading from 8.5(1)SU3 to higher releases.

For recovery from this issue when it happens when upgrading from lower releases to 8.5(1) SU3, a recovery script (tac_sv_recover_db.sh) is being provided to TAC.

It can be used only in the following specific scenario :-Switch version failure happened when migrating to 8.5(1) SU3-Install logs indicate that switch version failed due to detection of database corruption.-The last switch version attempt was incomplete due to a restart in the middle (i.e. it did not proceed to success or failure)

The script will display the timestamp of the database backup that was taken prior to Step 1 and offer to restore this backup.

Once the script has completed the recovery of the database, a new switch version can be attempted.

If this recovery is performed immediately upon detection of the database corruption, there will be no loss of data.

If the system goes operational without immediate recovery and subsequently this script is used to recover from the database backup taken during switch version, then any changes done on the database after the time of the backup will be lost.