This being said, this blogpost will tell you the upgrade path I decided to take to upgrade my 12.1 Grid infrastructure (GI) to 18c.
My homelab is running a 4 node flex cluster with 2 serverpools and one policy managed database. That one I will keep on 12.1. I know I should upgrade it as well to 12.2 but hey … can’t do it all at once.

The main problem which we will face are the exadata features which are currently enforced. The same happens when you try to start an 18c database on premise. The error you will get is

Prechecks

As with each installation, prerequisites should be met. That’s what we need our good friend cluvfy for.
Call me paranoia, but I usually first want to have my peace of mind that my current running cluster is ok.

This will check our environment about potential issues which will hold you back from from upgrading. You see I will be brave and attempt a rolling upgrade 🙂 for the rest it is similar to the cluvfy command you’re used to.

I do ignore the swap, I know about that and you should not ignore it for production, but as a sandbox playgarden … you got the picture.
I HAVE to run the fixup scripts

1

2

3

[root@labvmr01n01~]# /tmp/CVU_18.0.0.0.0_grid/runfixup.sh

All Fix-up operations were completed successfully.

[root@labvmr01n01~]#

Afterwards I have to patch my 12.1 cluster. The only patch a base install 12.1 cluster needs is patch 21255373. It’s a full rolling patch which is applied using opatchauto and I did not have any issues on my environment, so I won’t cover that here.

After this, patched the system with the required patch (rolling ofcourse) and reran the precheck:

Make sure all your nodes are listed and test the ssh-equivalence, it should be working already, but better safe than sorry.

As it is just for test and playing, I won’t register it in Cloud control now.

I have started it with a responseFile and I did fill in my Oracle Home. That’s why I can’t change it here I think. I like the idea of keeping the oracle base and oracle home separated. But that’s another discussion if this is ok or not.

I usually run my root scripts part of the installer. I know you can do it manually, but the only scripts he’s running, and he will ask you before starting, is the rootupgrade.sh. So we know what he’s doing and if it fails, then there is the retry button because since 12 the rootupgrade.sh is restartable. So no harm in doing it this way.

I like this idea! If you have a big amount of nodes you can separate them in batches. This also saved my a** a little as between the batches you get a pop-up to ask you if it is ok to continue with the next batch. I used this time in between to correct the missing _ parameter in the asm spfile to make sure that during the installation always at least two asm instances are available. Yes this is something I definitely would do in production, but it’s to get it running. Also, we know that in July it is planned to be released for on premises, so no fiddling around anymore by then, but for now, it does help.

The very well known moment of truth.

it’s my lab … /u01 and swap are pretty small, so this is safe to ignore. He will complain with a dialog box that you choose to ignore this message and you can confirm that you’re sue about it.

This is something I would definitely recommend. Always save your response files! You never know what you need them for. For re-running your configuration assistants for example 😉

And there the fun starts! After a while it pauses, and do not click anything yet!

During my installation I had node1 and node2 in one batch and node3 and node4 in the other batch. What happened during the rootupgrade.sh was that indeed the asm instance did not come up properly due to the error

This wasn’t too much of a problem as my database was still able to connect to asm through the other surviving asm instances. The moment I saw that I hit this error I started the instance using a pfile containing the spfile entry and the underscore parameter. When all was done, I recreated the spfile containing the _exadata_features_on parameter. The proxy instances did pick up their pfile in $GRID_HOME/dbs and started up without any issue.

If you have only 2 nodes, it can be an option to put each node in a separate batch. It seems a bit overkill at first, but it gives you a pause to make sure you always have an asm instance available and to connect to it so the assistants don’t fail. When your both asm instances and their proxies are back online, then click “Execute now” and the installer continues.

Then It’s time for the configuration assistants.

If for some reason or another you loose your session if you want to rerun the config assistants, then you can rerun them using gridSetup.sh and giving the execConfigTools flag.

This went actually pretty smooth as soon as I found out on how to get around the GIMR issue. Check ISSUE 4 further in this blogpost. Afterwards … all was done and I had a running 18c cluster.

Next steps was to

enable the volume and acfs volume GHCHKPT.

enable and start the rhp (rapid home provisioning)

In my case they were not enabled by default. You can choose, or you do it in the brand new fancy asmca and click around. In the settings box, you can enter the root passwordt, which makes life a little easier, or you use the commandline. It’s up to you.

Issues and their workarounds

ISSUE 1

During one of the upgrade attempts, my installation kept complaining I wasn’t on the first node. Afterwards I found out I found it had to do with DNS. I Installed my old cluster using shortnames and wanted the new nodes to have their fully qualified domain name. In the logs he then sees that it doesn’t match exactly and he tells you that you’re not on the first node. The logfile you’re looking for is cluutil2.log.

So you see that in my particular case labvmr01n01 is my master node and I will thus perform the upgrade from the master node.

ISSUE 2

ASM instances

If you’re just like me too stubborn to check some things upfront sometimes. Ok I admit, this was on first attempt, but I would highly recommend to make sure your asm proxy instances are running. I needed them to make sure the upgrade succeeded. Also, make a not on where your spfile is located in asm:

1

2

3

[grid@labvmr01n01~]$asmcmd spget

+DATA/labvmr01-clu/ASMPARAMETERFILE/registry.253.943891935

[grid@labvmr01n01~]$

WARNING: this is an example! In the rest of my journey, the spfile might differ. What I did is also create a copy on the filesystem “just in case”.

If you find yourself in the same troubles as me, then you would end up in an asm instance who refused to start and teases you with

We can get around this. If you’re stuck and you don’t have a copy of your spfile, find the parameters back in the asm alertlog and construct it yourself with some creativity. In asm alert log you will seen something like this:

This is the moment where the underscore parameter comes into play the first time. Construct a pfile containing the spfile and the underscore parameter, then you can include the underscore in the spfile and you’re good to go again (but only until the proxy instances pop up).

1

2

3

4

5

6

7

8

9

[grid@labvmr01n01~]$cd/u01/app/18.0.0/grid/dbs/

[grid@labvmr01n01 dbs]$ls

hc_+APX1.dat hc_+ASM1.dat init.ora

[grid@labvmr01n01 dbs]$vi init+ASM1.ora

[grid@labvmr01n01 dbs]# cat /u01/app/18.0.0/grid/dbs/init+ASM1.ora

*.spfile="+DATA/labvmr01-clu/ASMPARAMETERFILE/registry.253.943891935"

*._exadata_feature_on=true

[grid@labvmr01n01 dbs]$

Then start the asm instance and get it online (shut it down afterwards again, because in this stage your upgrade assistant may be hanging and then you can just retry the operation)

Oh some nice to know. Don’t try to be smarter than Oracle and set it upfront, 12.1 doesn’t recognise it and will not start due to invalid parameters. At this point it’s in the spfile in the 18c version, so all good now.

proxy instances

For the proxy instances, they are a little different. The easiest workaround I found to get them starting and remain consistent during the process is to give them a pfile in /u01/app/18.0.0/grid/dbs . If you do this upfront, you only need to add them to node 1 as during the gridSetup.sh the home is copied over.

So in the end, I have these 4 files on all 4 nodes, just in case some instance is not on it’s normal node which can happen in a flex cluster.

The content of these files is the same for every file

1

2

3

4

[root@labvmr01n01~]# cat /u01/app/18.0.0/grid/dbs/init+APX1.ora

*._exadata_feature_on=true

*.instance_type=asmproxy

[root@labvmr01n01~]#

ISSUE 3

This is completely my fault by running the installer of 12.1 manually during a rebuild instead of using my scripts. I ended up with different groups. It is normal that MUST match. So what I did is, I copied my response file 12.1 to 18c and then i started the gridSetup.sh with the -responseFile option. That way you can convince the installer to use some other variables.

ISSUE 4

The GIMR (Grid Infrastructure Management repository ). This puzzled me during the first time I tried to upgrade. I admit, it was a bit late already, but it looked like the pfile was coming back or being generated. After some digging and reading scripts it was actually pretty simple. The assistant for the gimr firsts starts it up using its own pfile which it has backed up in the old $GRID_HOME/dbs and tries to drop it in order to recreate it.

In my, and i repeat: this is my particular case, I had screwed up (before) my GIMR already and as i do have limited resources I already deleted it. I know it is not healthy and I would strongly advise against doing so especially for production or real use clusters. That’s why the /u01/app/18.0.0/grid/crs/sbs/dropdb.sbs script failed. If you follow that carefully you can remove the GIMR manually and edit the script so it returns 0 then the installer accepts the retry. If you decide to do this, make sure you know what you’re doing and understand what is happening because if you leave some things behind Oracle doesn’t expect the rest might fail as well and we don’t want that. For creating the GIMR following command is used internally: