Follow me on Twitter

Introduction:

A couple of months ago I had the issue that in an 11.2.0.2 environment with Grid Infra structure and ASM on Linux , without any specific reason at a specific point and time would see messages about communication error between the databases and the ASM instance and I was not able to connect to the ASM instance with a sqlplus / as sysasm. I have opened a tar back then and even got myself a fresh bug 14767353 number but no answers. So But practically this remained unsolved. Friday 4th I had same issue again, so it was time to get back to arms and investigate this.

Below you will find the steps , what i saw and how I solved it with the help of my friends(Google and Metalink).

By the way I could not connect to any of the instances on the box locally (without connect string). Connecting using the connect string however that did work:

oracle@mynode01hr:/opt/oracle []# sqlplus system@mydb11

SQL*Plus: Release 11.2.0.2.0 Production on Fri Jan 4 15:19:39 2013

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter password:

Connected to:

So after round one in this quest , after setting up remote listener I still had that same issue. Though it was better practice in place it only solved the issues partly.

On metalink (which sometimes feels more like an Oracle of Delphi) I saw two workarounds , one would be change primary os-group to oinstall (not preferred because with us still old fashioned but working with dba as primary group so changing that in a running production box did not sound like a cunning plan). Second workaround gave me a good clue , connect via connect string which i tested with the normal databases and the ASM instance.

So after reading and test that was a good thing half the puzzle was solved since I was able to communicate with both databases and the ASM instance as soon as I Added a connect string to my sqlplus session. I performed an extra test to see if the communication between the Databases Instance and the ASM instance was ok by resizing a datafile.

So that did prove that the database and the ASM instance did work together properly. Which makes the issues at hand a little less dramatic.

As I wrote on Metalink I had seen a suggestion to change to primary os- group of oracle (my environments are all oracle:dba) to oracle:oinstall). Even though I did not like it in the first or second place it gave me a thought I should check about os-groups on the Linux level. During that check the good thing was that then I noticed a delta in /etc/group on all 3 other boxes.

oracle@mynode02hr:/opt/oracle []# cat /etc/group|grep oracle

oinstall:x:47374:oracle <—- not present on mynode01hr , 04hr , 05hr

oracle@mynode02hr:/opt/oracle [MYDB11]# id -a oracle

uid=101(oracle) gid=101(dba) groups=47374(oinstall),101(dba)

This showed that oracle at least had the group oinstall available !!!! on the second node (02hr) where everything was working properly. The oinstall group was lacking on all other 3 nodes including on the first node with the issues . Now that made me wonder. What if Oracle still want to have the group oinstall present even if was not the primary os group.. I contacted Linux guru and he was kind enough to help me , he added the group ( locally ). Typically to be tested with:

id -a oracle

I tested it on the first Box 01hr which was having issues and it worked! I can connect without ORA-01031: insufficient privileges directly again to ASM without password and without connect string. And also no more messages in the Alerts of the instances that are connected to this Instance:

Conclusion:

The Linux boxes were not set-up in a similar way and fortunately the correct set-up (with both the os-groups oinstall and dba being present) was in place on one box. By comparing that to the other boxes I found the missing piece in the puzzle after which my connect errors and the messages in the alerts of the databases vanished. It looks very much that the oinstall group simply has to be out there with the user oracle at least being a member of that group.