Re: solaris 10 + 10gr2 +10.2.0.3p NOT a success story

On May 8, 5:11 pm, "pmik" <p..._at_qnr.com.gr> wrote:
> Hello everyone,> We have recently installed 10gR2 RAC + ASM on 2 Sun servers running> Solaris 10 OS. Since we started using the system (before applying> patch 10.2.0.3) we have been observing a rather strange behaviour.> We start the CRS, nodeapps, asm and finally the database. crs_stat -t> reports everything to be fine. Then, after a random period of time,> one of the nodes' vip is lost, resulting to the eviction of the node.> After applying the patch, the node is not evicted, the vip is> reassigned to the other node and the instance keeps functioning> properly. BUT, the vip continues to fail on random intervals.> After searching through the OS's logs, in /var/adm/messages we get a> message that the qfe (public IP nic) gets turned off and restored in> a second at the specific moments that the vip service gets losts.> After we experimented enough, we came to the discovery that if we> keep pinging endlessly the public IPs of the the 2 nodes, the qfes of> both the servers never get turned off and on. This way the RAC> performs without any problems.> Can someone of you explain this behavior? Is there something we are> missing during or post installation? Is this a Solaris problem, an> Oracle problem or a HW problem? We have almost eliminated the> posibility for a NIC problem, since we switched the qfe nic for the> ce nic (previously used for the interconnect) and we get the same> behavior, this time for the ce card.> Any help and/or guidance is extremely welcome since we seem to run> out of options and directions. If anyone is interested, we will> gladly provide any logs or elaborate on the matter.> Thank you very much for your time and interest.> Petros Mikos

Would also be a good idea to post the relevant pieces from the CRS log
files - maybe somebody sees something you missed(?!).
Received on Tue May 08 2007 - 16:57:31 CDT