Blogroll

SCN – What, why, and how?

In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.

What is SCN?

SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:

Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.

Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.

Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.

Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.

SCN format

SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of times base wrapped around 4 billion. Few simple SQL script will enumerate this better:

In the SQL statement below, we use dbms_flashback package call to get the current system change number, we also convert that number to hex format to breakdown the SCN.

Here, hex value of the SCN is 0x280000371 and decimal format is 10737419121. Let’s review the hex value 0x280000371, this value can be split in to two components, better written as 0x2.80000371, where 0x2 is the wrap and 0x80000371 is the hex representation of base. To verify the base and wrap, we can put them back together to get the SCN value. Essentially, multiply wrap by 4 billion and add base to get the SCN in number format. Script shows the output and see that these two numbers are matching.

Even though there were 1000 changes to the table, just 9 SCNs increased. If we dump the redo record using the script dump_last_log.sql (script is inline at the end of this post) then we can see redo records have both SCN and SUBSCN below too. Many REDO records are having same SCN and SUBSCN combo.

Database link based transactions can cause SCN increases too. For example, let’s say that, three databases db1, db2, and db3 participate in a distributed transaction and let’s say that their current SCN is 1000, 2000, 5000 respectively in these databases. At commit time, a co-ordinated SCN is needed for the distributed transaction and maximum SCN value from all participating databases is chosen; SCN value of these three databases will be increased to 5000.

Can you run out of SCN?

As you saw earlier, maximum SCN hard limit is 281 trillion. In addition to that, there is also a soft limit imposed by Oracle code as a protection mechanism. If the next SCN is more than the soft limit, ORA-600[2252] is emitted and the operation cancelled. For example, in the case of database link based distributed transaction, if the co-ordinated SCN is greater than the soft limit ORA-600 emitted.

This soft limit is calculated using the formula (number of seconds from 1/1/1988) * 16384. As the number of seconds from 1/1/1988 is continuously increasing, soft limit is increasing at the rate of 16K per second continuously. Unless, your database is running full steam generating over 16K SCNs, you won’t run in to that soft limit that easily. [ But, you could create ORA-600[2252] by resetting your server clock to 1/1/1988].

Problem comes if many interconnected databases each generating at higher rate in kind of round-robin fashion.DB1 generates 20K SCNs per second in the first 5 minutes, DB2 generates 20K SCNs per second in the next 5 minutes, DB3 generates 20K SCNs per second in the next 5 minutes etc. In this case, all three Databases will have a sustained 20K SCNs per second rate. Database is slowly catching up to soft limit (1 second per every 4 second exactly) and again, it will take many years for them to catch up to the soft limit assuming the databases are active, continuously. But, there is that infamous, hated by my client, hot backup bug.

(BTW, To reach hard limit, it will take 544 years to run out of SCN at 16K rate normally (65536*4*1024*1024*1024 / 16384 / 60/60/24/365)).

Here is an example of ORA-600 [2252] error. In this example lines printed below, 2838 is the SCN wrap and 395527372 is the SCN base. If we convert this to decimal SCN it is in the 12 Trillion range. Database link based connection was trying to increase the SCN over 12 Trillion value, but it was rejected by the database as the SCN was exceeding the soft limit.

BTW, in 10g, this 16K per second was hard coded. But, 11gR2, this limit is controlled by an underscore parameter _max_reasonable_scn_rate defaulting to 32K.

Hot backup bug

Most DBAs use RMAN to do backup. But, still, there are few databases that use hot backup mode, primarily because of disk mirror based backups. It is a common behavior to see higher SCN rate if the database is altered to hot backup mode. A SGA variable array keeps track of the backup mode at file level. When you alter the database out of backup mode, SGA variables are reset and the higher SCN rate goes back to normal. Due to a bug (12371955), that SGA variable is not reset leaving the database to think that it is still in hot backup mode. Database generates SCN at higher rate. (if you recycle the database later, of course, the variable is reset to normal rate). There is way to dump the SGA variable to check if the database currently thinks if it is in hot backup mode or not.

Due to this bug, an highly active database can create increased SCN rate over 16K. Over a long period of time (in fact, it probably will take many years) the SCN catches up to the soft limit. Once soft limit is reached, next SCN update will throw ORA-660[2252] errors. Of course, this SCN growth is propagated to other databases over database link. As the soft limit calculation is time based, time zone of the server is also important. For example, if the values are close enough to soft limit, then the databases running in US Eastern time zone will have an higher soft limit by (4*60*60*16384 =235 million ) then the databases running in Pacific Time Zone.

Salient points of the bug are:

There is no corruption danger, sessions might die or the databases might throw ORA-600 errors. In rare cases, databases have to be kept down for few hours or distributed transaction removed from the database so that the head room between the soft limit and the current SCN is widen.

This bug affects only if you use ‘ALTER DATABASE’ command. If you use, ‘ALTER TABLESPACE’ command for backup, you are not affected by this bug.

SCN rate is also directly relevant to activity. If the database has lower activity, SCN rate is also lower, even when the database is altered to backup mode with this bug.

There is a script released by Oracle that can tell you how close your database is to the soft limit,aka SCN headroom. So, first check if your database is having any SCN issue or not, that script is available as bug 13498243 and tells you how many days of SCN headroom you have.

How to check SCN rate?

There are multiple ways to check SCN rate in your database.

Method 1: smon_scn_time keeps track of the mapping between time and SCN at approximately 5 minutes granularity. That can be used to measure SCN rate, see code below. Although, this is easier to check, remember that there is no easy way to identify if the SCN increase is due to intrinsic activity in the database or is it due to an external database increasing the SCN by a distributed transaction activity. We will discuss this differentiation later.

v$log_history also can be used to check the SCN rate of the database. In this code below, you can see the SCN rate per second queried from v$log_history. Even if you are running in RAC, query against v$log_history is sufficient as it holds the archive logs from all threads. If there is a SCN spike, say from a remote database, then you will see a SCN spike in the output of this query below.

In the output above, there was a SCN jump by 10 Billion between 14:27 and 14:05. You can’t differentiate if that increase came from external systems or is it due to intrinsic activity easily. In this specific case, because this is an extreme SCN increase, and I would guess that it came from external systems. ( But usually this level of SCN increase will not happen in your production site and my example is to just explain the concept).

What happens in RAC?

In RAC, instance that receive the update from external system will increase the SCN of the database SCN to the new higher SCN. When other instances query for next SCN, immediately that SCN increase will be propagated to other instances too.

Can two threads get same SCN?

Obvious answer is No. Correct answer is yes. For example, redo records from two threads shows that they have exact same SCN and subSCN. This is not a problem or concern, as the buffer changes are protected by GCS layer code, and the row changes are protected by locking mechanism.

There is a statistic that can also guide us to determine if the SCN increase is intrinsic or extrinsic or not. Statistics ‘calls to kcmgas’ gives an approximate number of calls to allocate SCNs. This statistics is an estimate only, not an absolute count of generated SCNs. We will understand this stats with a script and an helper function.

From the output, we can see that 857 SCN differences vs 826 kcmgas calls form this session. There could be other background processes generating SCN which would explain this difference. Even at instance level, it doesn’t match exactly, but multiplying ‘kcmgas calls’ statistics by 1.1 gives you better estimate. This method can be used to identify if the SCN growth is intrinsic or extrinsic in a database. It can be also used to identify the instance generating more SCNs in a RAC cluster or the database generating more SCNs in a complex interconnected environment.

SCN Vulnerability issue

I am not going to discuss details about this vulnerability issue at all. But, this vulnerability require access to production database. DBAs with security in mind, don’t allow production access that easily anyway. So, In my opinion, it is a problem that must be addressed, but you would need a malicious DBA with expert level knowledge to misuse this vulnerability. Follow Oracle support direction on this one as I usually stay away from talking about security vulnerability issues. Check here for details

How to check the hot backup state from SGA variables

In the past, many of you have asked me about details for checking the flag using SGA variables. Following is the method to do that.

SGA variable kcvblg is an array data type and keeps track of the status of hot backup at file level. Length of each array element is 8 bytes and so, by dumping the array for a length of (db_files*8), you can see backup status for each file.

From the new trace file, we can see that many array element value is set to 1 indicating that hot backup is Enabled for those data files. BTW, array kcvblg is fully allocated for the size of db_files*8. But, the flag is altered only if a datafile is assigned for that array slot. So, If I have 100 data files with db_files=200, then only 100 elements are altered to 1 or 0.

Using this method, you can identify if the hot backup is enabled at data file level or not. Essentially, alter database or alter tablespace commands, identify all affected datafiles and alter the kcvblg array element associated with the data file. Command ‘alter database end backup’ or ‘alter tablespace end backup’ command will reset the flag to 0. Specific Bug I discussed in this blog was that alter database command forgets to reset the flag causing increased SCN usage.

Summary

I have been holding on publishing this blog entry for many months now. Since this issue is in the public knowledge domain, I can share the knowledge without any repercussions. In a nutshell, understanding SCN generation and intrinsic details about it is important. Armed with scripts, you can review your environment.

Oracle support has published a note describing this issue 1376995.1. You might want to refer to that note too.

update 1: Fixed formatting and typo.

update 2: Fixed to read “Essentially, multiply base by 4 billion and add wrap to get the SCN in number format”
update 3: updated a typo in a code fragment.
update 4: Adding a section about how to use SGA variable to check backup status (This was not in public domain when I posted it originally, but now it is and so, posting details here).
update 5: added MOS document id.

Hello Denis
Thanks for stopping by.
If I understand correctly, you want more details about SCN and how it is used in backup & recovery, instance crash, thread recovery, etc.
Well, I can’t cover that much details in the blog entry, it would require couple of chapters in a book. Even then, audience who will be interested in that level of depth is very small.

stephensaid

Riyaj, have you applied/tested the PSU5 patch? Patched databases will reject dblinks that will elevate the scn too close to the scn softlimit. My question is if you’ve played with it in this use case. Will a patched database reject a formerly good dblink that has already been established/connected but then some time later becomes a dblink that has a dangerous level of scn headroom? Wondering if the patch just protects against bad dblinks at connection time or is it any ongoing check throughout life of link?

Hello Stephen
Thank you for reading my blog.
Nope, I haven’t had opportunity to test this.
My understanding is that, at any time, there is a SCN increase from a database link, a sanity check is performed and exception raised if exceeding a threshold rate.

In your Query ‘maxscn’ ist calculated fix with 31 Days per month over timeperiod – is this the way oracle calculates the ‘scn softlimit’ ?
We have calculated with the real count of Days per month over timeperiod – is this wrong?

Hello Robert
Yes, that’s correct. That’s the infamous 31-day bug. Oracle code assumes 31 days while calculating number of seconds elapsed from 1-1-1989. This does not mean that Oracle Date calculation is incorrect, it is just an code optimization to calculate elapsed seconds quickly.

Hrishikeshsaid

Hi,
I want to alter below SCN parameters , but it’s not happening.Help is much appreciated
=============
_external_scn_logging_threshold_seconds= 600;
_external_scn_rejection_delta_threshold_minutes= 10080;
================
I used below alter command to set the parameters, but it’s not helping out
SQL> alter system set “_external_scn_logging_threshold_seconds”= 600 comment=’Set threshold on 31/05/2013 – See SR#497386-326095900′ scope=spfile;

[…] In 9i there was also a means of mapping an SCN to a TIMESTAMP. This involves the system view SYS.SMON_SCN_TIME. The view no longer exists in 12c. One limitation is that it only holds 1440 rows (well it does at 9i but I can see 2711 rows in an 11GR1 database and 2500 in an 11GR2 one) and therefore cannot hold more than 5 days worth of history. There is a good blog entry explaining this at https://orainternals.wordpress.com/2012/01/19/scn-what-why-and-how/ […]