Let’s say that a foreground process running in node 1, is trying to access a block using a SELECT statement and that block is not in the local cache. To maintain the read consistency, foreground process will require the block consistent with the query SCN. Then the sequence of operation is(simplified):

Foreground session calculates the master node of the block; Requests a LMS process running in the master node to access the block.

Let’s assume that block is resident in the master node’s buffer cache. If the block is in a consistent state (meaning block version SCN is lower (or equal?) to query SCN), then LMS process can send the block to the foreground process immediately. Life is not that simple, so, let’s assume that requested block has an uncommitted transaction.

Since the block has uncommitted changes, LMS process can not send the block immediately. LMS process must create a CR (Consistent Read) version of the block: clones the buffer, applies undo records to the cloned buffer rolling back the block to the SCN consistent with the requested query SCN.

Then the CR block is sent to the foreground process.

LMS is a light weight process

Global cache operations must complete quickly, in the order of milli-seconds, to maintain the overall performance of RAC database. LMS is a critical process and does not do heavy lifting tasks such as disk I/O etc. If LMS process has to initiate I/O, instead of initiating I/O, LMS will downgrade the block mode and send the block to the requesting foreground process (this is known as Light Works rule). Foreground process will apply undo records to the block to construct CR version of the block.

Now, the Foreground process might not find the undo blocks in the local cache as the transactions happened in the remote cache. A request is sent to remote LMS process to access undo block. If the undo block is not in the remote cache either, remote LMS process will send a grant to the foreground process to read the undo block from the disk. Foreground process accounts this wait time for the undo segment block grants to the ‘gc cr disk read’ wait event.

There are other reasons as to why FG process might have to read undo block. One of them is that Fairness downconvert triggered by LMS process. Essentially, if a block is requested too many times leading to many CR block fabrication, then instead of LMS doing more work, LMS process will simply down convert the block, send the block and grant to the requester an access to the block. FG process will apply undo to construct CR block itself.

gv$cr_block_server can be used to review the number of down converts, Light works etc. But, it is probably not possible to identify the reason for a block down convert after the event.

Why do we need this new event?

There is a very good reason why this event was introduced. Prior to 11g, waits for single block CR grants are accounted to wait event such as ‘gc cr block 2-way’, ‘gc cr block 3-way’ etc. Waits for grants on remote-instance-undo-blocks for CR fabrication is special, in the sense that, this is an additional unnecessary work from the application point of view. We need to be able to differentiate the amount of time spent waiting for undo block grants for CR fabrication vs other types of grants (such as data blocks etc). So, it looks like, Oracle has introduced this new event and I do think that this will be very useful for debugging performance issues.

Prior to 11g, you could still differentiate waits for single block grants for undo using ASH data or trace files. But, you will have to use the obj# field for this differentiation and obj# is set to 0 or -1 in the case of undo blocks/undo header blocks.

Test case

Of course, a test case would be nice, Just any regular table will do and my table structure have just two columns number, varchar2(255) with 1000 rows or so.

At step 3 in our test case, I flushed the buffer cache in node 2. When I reread the block again in node 1, here is an approximate sequence of operations that occurred:

For SELECT statement in step 4, foreground sent a block request to LMS process in node2; LMS process in node 2 did not find the block in the buffer cache (since we flushed the buffer cache).

So, LMS process in node2 sent a grant to the foreground process to read the data block from disk.

Foreground process read the block from the disk, found that block version is higher than the query environment SCN and that there is a pending transaction in an ITL entry of the block.

Foreground process clones the buffer and tries to apply undo records in order to reconstruct CR version of the block to match the query environment SCN.

But, that pending transaction was initiated in node 2 and that undo segment is mastered by node 2. So, FG process sends a request for the block to node 2 LMS process. LMS process does not find the undo segment block in the node 2 cache and sends back a grant to read the block from the disk to the FG process. Meanwhile, the FG process is still waiting and the amount of time that FG process was waiting to receive the grant is accounted towards ‘gc cr disk read’.

Let’s review few lines from the trace file. Block 2758 holds the row physically. After reading that block, a ‘gc cr disk read’ wait event is encountered. Essentially, the FG identified that there is a pending transaction, so sent a request to LMS accounting the wait time to ‘gc cr disk read’ event. For this ‘gc cr disk read’ event, p1 is file_id, p2 is block_id, p3 seems to be a counter increasing by 1 for each of these waits. For example, for the next gc cr disk read, p3 is set to 44. obj#=0 indicates that this is an undo block or undo header block.

Notice that next line indicates a physical read for that undo header block occurred and the obj# is set to 0.I verified the undo header block by dumping the block too

Commit cleanouts is another reason why you would encounter this wait event. When a session commits, that session will revisit the modified blocks to clean out the ITL entries in the modified blocks. But, this cleanout does not happen in all scenarios. For example, if the number of block changes exceeds list of blocks that session maintains in SGA then the session will mark the transaction table as committed, without cleaning out ITL entries in the actual blocks. Commit cleanout doesn’t happen if the modified block is not in buffer cache anymore (We will use this idea to trigger commit cleanout in the other node).

Next session reading that block will close out the ITL entry in the block with an upper bound SCN. Let’s tweak our test case little bit to simulate commit cleanouts.

One notable difference in the test case below is that after flushing the buffer cache, we switch back to the session updating the block,commit the transaction and flush the buffer cache again. With these two flush we guarantee that session will not commit cleanout and undo block is flushed from the cache. SELECT statement is executed after the commit. When we read the same block from a different node then the reader session will cleanout the ITL entry. Cleanout operation require to identify the transaction state and max query SCN. To identify whether the transaction is committed or not, session in node2 must read the undo header block. Waits to receive the grants for the undo|undo header block are accounted towards gc cr disk read wait event.

node 1: update rs.t_1000 set v1=v1 where n1=100

node 2; select * from rs.t_1000 where n1=100 – just to get parsing details away.

Is there any thing you can do to improve the performance of the application and reduce ‘gc cr disk read’ wait events? There are few options that you can consider, some are long-term options though.

Consider Application affinity. If the application is running in the same node that transactions are aggressively modifying the objects, you can reduce/eliminate the grants. Application affinity is important even though cache fusion is in play.

Of course, tune the statement such a way that number of block visits are reduced.

Reduce commit cleanouts. For example, if a program is excessively modifying the block in node 1. Subsequent program is reading the same blocks in node 2, then you may have to cleanout the block manually from node 1 by reading the blocks. BTW, parallel queries do not perform commit cleanouts, only serial queries will perform commit cleanouts. Another operation that can get stuck is index creation. If the table has numerous blocks without commit cleanouts, then index build might be slower and can suffer from gc cr disk read events. This is magnified by the fact that these reads are single block reads and much slower for bulk operations such as index rebuild. Reading through all blocks of the table through SELECT statements might be good enough to complete commit cleanouts.

Don’t keep long pending transactions on highly active tables in RAC environment. This typically happens for applications that uses tables as queue to pick up jobs i.e. fnd_concurrent_requests table in EBusiness suite. Proper configuration of concurrent manager, optimal values for sleep and cache might help here.

For read only tablespace, you might want to make sure that there are no unnecessary commit cleanout operation, since the block will be never cleaned and so every session will try to clean out the block.

Summary

In summary, this is an useful event to differentiate CR fabrication performance issues and other performance issues. Using few techniques mentioned here, you can reduce the impact of this event.

6 Responses to “gc cr disk read”

Most excellent posting, my friend. Clarity and technical depth winding up in 5 solid recommendations that have great potential to increase throughput and scalability and little risk that someone will be able to twist your well explained reasoning into anything harmful.

A good whole chapter could be written about preparation of a tablespace to be made read only for a substantial period of time.

In extreme cases someone might also need to hermitize* sequences in support of application affinity so that blocks of seq$ are spared from excessive instance migration as well.

*hermitize meaning to isolate rows one per block (like hermits, one per dwelling place). Oracle normally stores about 75 sequence definitions per 8K block. The block grouping tends to be by order of definition and so may cross application affinity boundaries. Listing out the sequence names grouped by block can be quite illuminating.

Here you can see that only UNDO blocks are read and there is no request for data block. Is this due to commit clean out? This is THREE node 11gR1 RAC. We kill the session and rerun the job then it goes through.

We find this issue happening when ever this select query runs from either node1 or node2 and UNDO blocks are from node3( file#=5 is datafile for UNDO tablespace for node3)

Here is some more information on traffic between the nodes.

SQL> select INST_ID,CR_REQUESTS,CURRENT_REQUESTS,UNDO_REQUESTS,DISK_READ_RESULTS,FAIRNESS_DOWN_CONVERTS,LIGHT_WORKS,ERRORS from gv$cr_block_server order by 1;

For commit cleanouts:- undo header blocks (transaction table entries) and undo blocks associated with the transaction table control block changes must be read. The transaction table in the undo header block will indicate the transaction state. But, if the transaction table slot is overwritten for that transaction (and so commit SCN is not immediately available), then Database code will follow undo chain of the transaction table control blocks to create a prior version of transaction table recursively, in a quest to find commit SCN of the transaction.

For consistent reads:- undo chain of a transaction (a transaction that is either active or commit SCN of that transaction is greater than query SCN) must be followed for consistent read buffer generation.

First thing is to differentiate between two events: Does the session suffer from ‘data blocks consistent reads – undo records applied’ or ‘transaction table consistent reads – undo records applied” issue? (Stats names can be little different between versions) Check the session level stats to identify that:

As an additional check, review the block contents by dumping the blocks. Check if the undo records are indeed application table changes. You may have to dump the blocks to see contents. alter system dump datafile block min block max is the syntax.

(1) If the blocks are undo header blocks and stats indicate high ‘transaction table consistent reads – undo records applied’, then it is possible that you are running in to a side effect of auto undo retention tuning feature. Essentially, auto tuning of undo retention tries to reduce probability of ORA-1555 errors, by keeping as much undo possible tuning undo_retention upwards. But, this can negatively affect commit cleanouts since the code is seemingly trying to find exact commit SCN by rolling back the transaction table header changes numerous times.

In this scenario, you can setup max undo_retention and reduce the effects of auto tuned undo retention feature. you can also setup a job to select from the table, serially, and perform commit cleanouts in the same node.

(2) If the changes are for CR generation: (not undo header blocks), then it would imply that these blocks are read for Consistent Read generation. Meaning, few processes have updated objects heavily in one node,and sessions in other nodes are reading the blocks, but can not use the blocks since there is a pending transaction (or that commit SCN is later then query env SCN), and so rolling back the transactions to create a Consistent Read copies.

You could solve this problem with two approaches. If node 3 is modifying blocks heavily by (a) either schedule those intensive updates during less active period, so that, you can avoid these CR storm (b) Modify the application affinity such a way that programs modifying the objects aggressively and programs reading the blocks aggressively are executed in the same node. Features such as “in memory undo” etc doesn’t work well with RAC and so, other node sessions might have to read the blocks physically.

I do see that light_works in node 3 is much higher than other nodes, this would indicate that node 3 is performing massive changes and other nodes are reading the blocks from node 3.

kulkarnisaid

Hello Riyaj,
Thanks a lot for your wonderful explanation. I will run the query provided by you to check session level stats when this problem occurs again. This will confirm if waits are for data block consistent reads or for transaction table consistent reads.

However by the following lines in the trace file it appears that the session is suffering from ‘transaction table consistent reads – undo records applied’.