Fixing a problem with the ASM spfile preventing RAC 12c from starting

This is a little note to myself on how to fix a corrupt spfile in clustered ASM. I hope you find it useful, too.

Let’s assume you made a change to the ASM (server) parameter file that causes an issue. You are most likely to notice this once CRS is restarted but parts of the stack fail to come up. If “crsctl check crs” mentions any component not started you can try to find out where in the bootstrap process you are stuck. Here is the output from my system.

Another useful piece of information is the Clusterware alert.log. Unlike Oracle Clusterware version 11.2 where log information was in the $GRID_HOME the 12c CRS logs moved to the ADR. A quick look at the alert.log showed

2015-07-28 09:16:51.247 [OCSSD(11611)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 11611
2015-07-28 09:16:52.347 [OCSSD(11611)]CRS-1713: CSSD daemon is started in hub mode
2015-07-28 09:16:57.974 [OCSSD(11611)]CRS-1707: Lease acquisition for node rac12pri1 number 1 completed
2015-07-28 09:16:59.076 [OCSSD(11611)]CRS-1605: CSSD voting file is online: /dev/vdc1; details in /u01/app/oracle/diag/crs/rac12pri1/crs/trace/ocssd.trc.
2015-07-28 09:16:59.089 [OCSSD(11611)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.
2015-07-28 09:17:08.198 [OCSSD(11611)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac12pri1 .
2015-07-28 09:17:10.276 [OCTSSD(11694)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 11694
2015-07-28 09:17:11.261 [OCTSSD(11694)]CRS-2403: The Cluster Time Synchronization Service on host rac12pri1 is in observer mode.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac12pri1.
2015-07-28 09:17:11.469 [OCTSSD(11694)]CRS-2401: The Cluster Time Synchronization Service started on host rac12pri1.
2015-07-28 09:17:43.016 [ORAROOTAGENT(11376)]CRS-5019: All OCR locations are on ASM disk groups [CHM], and none of these disk groups are
mounted. Details are at "(:CLSN00140:)" in "/u01/app/oracle/diag/crs/rac12pri1/crs/trace/ohasd_orarootagent_root.trc".
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri2, number 2, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri3, number 3, was shut down
2015-07-28 09:18:05.139 [OCSSD(11611)]CRS-1625: Node rac12pri4, number 4, was shut down

In other words, CSSD has found the block device I use for the voting files, and concludes its initial work. However, the oracle ROOT agent (orarootagent) cannot proceed since none of the OCR locations on ASM can be opened. Checking the log file at that particular time I can see where the problem is:

So there is not a single ASM instance that could serve the required diskgroup. Hmmm… So maybe I have to back out the change I just made. I have developed a habit of creating backups (pfiles) of spfiles prior to implementing changes. But even if there is no backup of the spfile I can still get the system back, and here are the steps I used. Just as with the database, I need to

Create a temporary pfile on the file system

Start ASM using this temporary pfile

Create a backup my (bad) spfile from the ASM disk group

Extract all parameters

Create a proper pfile that I use to start the cluster with

Convert that to a spfile in ASM

Fixing the problem

The first step is to create a temporary pfile. Using the ASM instance’s alert.log I can go scroll up to a point in time before the change I made to check which parameters are needed. These following are just an example, your settings are different!

The alert.log also records the location of the spfile-you should back this up now (using asmcmd or any other tool). Using the backup, you should be able to reconstruct your spfile, but make sure to take the offending parameter out.

I decided to create the spfile as spfileASM.ora in ASM. I amended my temporary pfile with the settings from the recovered spfile and put it back into the cluster.

This should be it-the correct values have been restored, the spfile is back on shared storage, and I should be able to start with this combination. After having issued the stop/start commands to CRS it was indeed all well: