The Solaris group is a forum where peers share technical expertise, solve problems, and discuss issues related to the Solaris operating system, including OS-related malfunctions, security issues, and network performance.

HP ProLiant DL585 G2 Rebooting Unexpectedly

I have a HP ProLiant DL585 G2 server with Oracle Rac 10gR2 installed on it. The servers keeps rebooting frequently without any message in the oracle log nor windows event viewer nor the HP IML and ILO. We recently did a hardware test using start smart and no errors were found. Can you please advise?

Constant server rebootes occures mostly because of memory problem. Out-of memory for example or because of hw failures in memory modules. I guess it worth a try to monitor performance with top or something.

Sorry, I went out the good old windows, thought it's solaris, either RAM failure or firmware issues. What version of ILO are you running? Usually an upgrade of ILO firmware and ILO drivers will sort that out. Check that before you start opening the box to get dirty.

We use Cluster Ready Services (CRS). On top we have on voting disk which is down. Can this cause this issue. However, how come the system run smoothly and then reboots like this unexpectedly, sometime several times a day and sometime after 3 weeks of smooth running.

I know upgrading can resolve many issues. However this is our production environment and we have to go through loads of tests and procedures when upgrading. The server is up more for that one year and its been more than 6 months since we last had a issue with the machine. We had a thorough test of our hardware and no errors were found.

A memory leak can cause the CSS to decide a reboot as this is a resource issue that can cause the any of the clusterware stack to be not able to function properly, and such cases should be reverted back to the sysadmins and OS vendor support to provide a memory dump and confirm the causer and provide the complete analysis if Oracle was the issuer for that memory leaks …, but first we suggest that you upgrade the CRS only to the latest version 10.2.0.4 and apply over it the latest Patch 8708078 which should resolve all the known issue fixed by these releases that currently contains fixes for similar issues causing node reboots, then after that if the issue reoccurs then we have to have the memory dump analysis from the sysadmins and OS vendor support to provide it to the development for investigation.

Copyright 1998-2015 Ziff Davis, LLC (Toolbox.com). All rights reserved. All product names are trademarks of their respective companies. Toolbox.com is not
affiliated with or endorsed by any company listed at this site.