Tuesday, February 4, 2014

Start crs error CRS-4640 11gR2 clusterware, crsctl start cluster

I have faced the following problem when I was working on several Exadata Systems/RAC systems.
The problem arises after a patch application or maintanence operation, extactly when I try start the crs back again..
CRS can not be started with "crsctl start crs command", as you see below;
/u01/app/product/11.2/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

Note that : crsctl is an interface for controlling Oracle Clusterware objects.

The solution is to use crsctl start cluster command.. That's all but i m writing this post in order to expose the reason lies behind this solution..

So, lets look to the difference between start crs and start cluster is ;

start crs : to start the entire Oracle Clusterware stack on a node, including the OHASD process,which is responsible for starting up all other cluserware processes . This command is to be used only on the local node..

start cluster : to start Oracle Clusterware stack on local node . It does not include the OHASD process.

Okay now we know the difference, but this does not explain CRS-4640 error produced when we used the start crs command..

Additional info: If your OCR and Voting Disks are in ASM, you shouldnt shutdown the ASM instance alone. You need to stop the Oracle Clusterware stack. You have to user crsctl stop cluster -n node_name or crsctl stop crs (on local)
I can not reproduce right now, but this can be the reason.. The error is produced because we shutdown ASM.. We didnt use crsctl stop cluster or crs commands.. Voting disks and OCR need to be mounted for csrd to operate, because OCR contais the cluster node list, services, db instances and node mappings. Oracle Clusterware uses this info to verify cluster node membership and status.. On the other hand, crsctl start crs command should start asm , too.. So why are the errors produced?

"crsctl start crs" tries to start OHASD and can not do that as it s already started.. In my opinion, that is why it can not continue and start crs... The root cause seems to be stopping ASM without using crsctl stop cluster or crs commands, as this leaves an improper environment for crsctl start crs command..

In conclusion, we use crsctl start cluster in this situation, as we have OHASD up and running , which is a prereq for "crsctl start cluster command", and that's why crsctl start cluster becomes our solution..

We have similar issue Node1 is working fine and Node2 throws Oracleasm module error. Both node on cluster with ASM RAC. Our OS is Redhat 7.0.We are able access db thru Node1 but Node2. Could you please help us.