Monday, December 14, 2009

In this post I want to share with you some scenarios about RAC One Node and client behaviour. In that part I’m using SQL*Plus from 10g so there is no FAN notification, next part will be with FAN enabled client. The main difference between using and not using FAN is that FAN client has a feedback from Oracle Cluster (formally Grid Infrastructure in 11R2 or Oracle ClusterWare in previous releases) and has information about current cluster status. Non FAN client has to wait for TCP/IP time-out for current session before switching to other node. More information about client failover and cluster configuration can be found here.

My server and client configuration has been presented in part 1 (entry from TNSNAMES.ORA) so this is only short recall:

One node in cluster is runningDue to failure or maintenance only one node in cluster is running. RAC One node service is starting on that node and it is opened for all sessions. Irrespective of which node was current previously instance is open on node which is running now. Current status can be checked via raconestatus

Active QueryNow first scenario when I’m repeating a select command in SQL*Plus and in that same time I’m turning off RAC1 server. In that case a active statement is cancelled and client is trying to reconnect at that time.

SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH-----09:55

SQL> select to_char(sysdate,'hh24:mi') from dual;select to_char(sysdate,'hh24:mi') from dual*ERROR at line 1:ORA-03135: connection lost contact

SQL> select to_char(sysdate,'hh24:mi') from dual;ERROR:ORA-03114: not connected to ORACLE

As you can see command has been cancelled and session has been terminated. There is no other instance up and running yet to there is no target to failover a session. In client session network trace file you can find following order of errors:

ORA -12547 – TNS Lost Contact

ORA- 12151 – Bad packages

ORA-03115 – connection lost contact

After that client is going to reconnect session to other nodes like in normal RAC configuration. Client is calling SCAN address to get another VIP address but cluster is still in that same state – a reconfiguration has been just started – and Oracle Cluster is returning a rac1-vip once again as it is only one VIP configured for database service.Keep in mind that current, not committed transaction in that case will be rollback.

Idle sessionWhen a session is in an idle state a failover will take place when a first SQL query will be executed.See example

As you can see a connection has been migrated between servers without any error. This is because a before a last query a new instance was up and running. If we take a look into client network trace file we will see following errors

ORA- 12151

ORA- 3113

After that client is going to reconnect session to other nodes. Client is calling SCAN address to get another VIP address and cluster in returning a new VIP (rac2-vip) as a database service has been successfully migrated to other node.Keep in mind that current, not committed transaction has to be rollback after session migration.

Fixing configurationAfter unexpected crash it is possible that RAC One Node configuration need to be fixed.I have figure out two possible scenarios:

After node crash and migration of instance and service the crashed node has been rebooted and there are two running instances

If database is up, it will be checked and cleaned after a previous fail over.

Very similar situation like in previous point but only one instance is working and there is a “mess” in RAC service configuration. Oracle RAC One Node fix will clean up a configuration.

At the end I want to mention that I have hit some strange behavior related to migration of instance between servers and starting and stopping servers. I’m not sure if it a bug or not but after whole cluster restart instances have been migrated across.First configuration was

RAC1 – instance name : testone_2

RAC2 – instance name : testone_1

Yes I know it was my inconsequence in naming so Oracle decided to fix it. No, not at the beginning but after a week of testing and restarts. And now it looks like:

RAC1 – instance name : testone_1

RAC2 – instance name : testone_2

Unfortunately after that I was unable to play with RAC One Node and I have to clear a whole configuration and create it from scratch.

Next part will be about FAN enabled clients and more about transactions.

1 comments:

This is an excellent article. Still, if you want something more hands-on, try these:http://vgrigorian.com/11gsimulator/1_rac11gr2.htmhttp://vgrigorian.com/11gsimulator/2_rac11gr2rdbms1.htmhttp://vgrigorian.com/11gsimulator/3_rac11gasm.htmhttp://vgrigorian.com/11gsimulator/4_11gr2dbcreate.htm

You can find more demos (including dataguard, goldengate, streams) there at http://vgrigorian.com/