Block Corruption II

As mentioned before, we’ve had a block corruption that hadn’t yet caused any problems in production but was causing issues for our 11g upgrade rehearsals with the datapump export/import stage.

Because it wasn’t causing any problems in production, the approach to addressing it was somewhat relaxed!

But what this issue did demonstrate was big concerns over the amount of redo that this system generates in a day – 500 GB – and how that volume of redo restricts the practicalities and ability to respond for a physical recovery of the problem, particularly as we only thing we knew was the recovery had happened at some point since 1st January 2012.

Fortunately we were able to identify that:

Only one block in one partition of one table was affected

And we could say with a high degree of confidence that:

That two rows in this one block were affected by the corruption and

We could get the values from these two row from one of the uat databases and that the data hadn’t changed since.

In other words, we could resolve this corruption with a manual, logical approach.

For example, cross-referencing a block dump to see how the block is corrupted, I could get all the rows in this one block that haven’t been affected using something like this:

4. Insert good versions of the corrupted rows extracted from an older UAT database.
5. Create mirror of local indexes on copy table

6. Partition exchange without validation including indexes

7. Rebuild global indexes on original table

8. Do stuff with stats if you needed to (we recovered stats from a stats backup table)

At this point, table REPAIRED_TRADE now has the corrupt block allocated.

Even if we drop the table, DBVERIFY still reports the corrupt block as it has not been formatted.

This is normal as the block will not be formatted until it is taken off the freelist and used by a new segment.

For completeness, we wanted to get this block formatted.

Because we did not drop table REPAIRED_TRADE we know where this block is.

So:

1. TRUNCATE TABLE repaired_trade REUSE STORAGE;

2. Drop the indexes on this table that we wanted for ease of partition exchange

3. Create trigger to tell us when we format the block in question:

CREATE OR REPLACE TRIGGER corrupt_trigger
AFTER INSERT ON repaired_trade
REFERENCING OLD AS p_old NEW AS new_p
FOR EACH ROW
DECLARE
e_corrupt EXCEPTION;
BEGIN
IF (DBMS_ROWID.ROWID_BLOCK_NUMBER(:new_p.rowid)=1093595)
AND (DBMS_ROWID.ROWID_RELATIVE_FNO(:new_p.rowid)=24)
THEN
RAISE e_corrupt;
END IF;
EXCEPTION
WHEN e_corrupt THEN
RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted');
END;
/

4. Reinsert data into this table from original TRADE table

INSERT
INTO repaired_trade
SELECT * FROM trade;

5. Repeat 4 if necessary, i.e. until we get the exception then

6. DROP TABLE repaired_trade;

At this point, DBVERIFY will no longer report that there is a corrupt block.