Here you will find posts that are mostly about my work as an Oracle DBA. There may occasionally be other topics posted, but by and large this blog will be about Oracle and other geeky IT topics. Perl will likely be mentioned from time to time.

Thursday, August 06, 2009

One suggestion was made to use the exp utility to export the database, thereby determining if there are corrupt blocks in the database due to disk failure. I didn't give it much thought at first, but fellow Oak Table member Mark Farnham got me thinking about it.

Using exp to detect corrupt blocks, or rather, the absence of corrupt blocks may work, but then again, it may not. It is entirely possible to do a full table scan on a table successfully, as would happen during an export, even though the blocks on disk have been corrupted.

This can be demonstrated by building a table, ensuring the contents are cached, then destroying the data in the data file, followed by a successful export of the table.

Granted, there are a lot of mitigating factors that could be taken into consideration as to whether or not this would happen in a production database. That's not the point: the point is that it could happen, so exp is not a reliable indicator of the state of the data files on disk.

This test was performed on Oracle 10.2.0.4 EE on RH Linux ES 4. Both are 32 bit.

Next the table LOST_WRITE is created in the tablespace of the same name. This will be used to test the assertion that a successful export of the table can be done even though the data on disk is corrupt.

Now swap the bytes in the file, skipping the first 2 oracle blocks
Caveat: I don't know if that was the correct # of blocks, and I didn't spend any time trying to find out
Also, I belatedly saw that count probably should have been 22 rather than 16, but the results still served the purpose of corrupting the datafile, as we shall see in a bit.

What this dd command is doing is using the same file for both input and output, and rewriting blocks 3-18, swapping each pair of bytes.

So, even though the data on disk has been corrupted, the export succeeded. That is due to the table being created with the CACHE option, and all the blocks being cached at the time of export. It may not be necessary to use the CACHE option, but I used it to ensure the test would succeed.

Now let's see what happens when trying to scan the table again. First the NOCACHE option will be set on the table, then a checkpoint.

If you do discover bad data blocks, you just might be able to do an export of the tables that are in the affected region before doing any recovery. This might be a belt and suspenders approach, but DBA's are not generally being known for taking unnecessary chances when possible data loss is on the line.