Wednesday, April 05, 2006

How to check for and repair block corruption with RMAN in Oracle 9i and Oracle 10g

Problem: the application encounters an ORA-01578 runtime error because there are one or more corrupt blocks in a table it is reading.

How can corrupt blocks be caused?First of all we have two diffent kinds of block corruption:- physical corruption (media corrupt)- logical corruption (soft corrupt)Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;Logical corrution can amoung other reasons be caused by an attempt to recover through a NOLOGGING action.There are two initialization parameters for dealing with block corruption:- DB_BOCK_CHECKSUM (calculates a checksum for each block before it is written to disk, every time)causes 1-2% performance overhead- DB_BLOCK_CHECKING (serverprocess checks block for internal consistency after every DML)causes 1-10% performance overheadIf performance is not a big issue then you should use these!

Normally RMAN checks only for physically corrupt blockswith every backup it takes and every image copy it makes.This is a common misunderstanding amoung a lot of DBAs.RMAN doesn not automatically detect logical corruption by default!We have to tell it to do so by usingCHECK LOGICAL!The info about corruptions can be found in the following views:SYS @ orcl AS SYSDBA SQL > select * from v$backup_corruption;RECID STAMP SET_STAMP SET_COUNT PIECE# FILE# BLOCK#---------- ---------- ---------- ---------- ---------- ---------- ----------BLOCKS CORRUPTION_CHANGE# MAR CORRUPTIO---------- ------------------ --- ---------1 586945441 586945402 3 1 5 814 0 YES CORRUPT-- SYS @ orcl AS SYSDBA SQL > select * from v$copy_corruption;

I recommend you to check your database for corrupt blockswith RMAN on a regular basis, proactively.If you do so you RMAN finds out about block corruptionsbefore your application runs into an ORA-01578 andbefore you find out that you have backed up the corrupt blocks again and again.

There have been incidents when DBAs found outthat they did not have a backup with the un-corruted block any more,because you have deleted the last one with a not corrupted version.They could not recover the block any more!