For couple of weeks, observed two scenairos/issues1)that the nodes get rebooted (may be fenced) or so for whatever reason-event logs (system/app/matrix) dont say much , except for the matrix server terminated messg 2) while the restarted node comes back online, for some reason, all the other nodes in the cluster go into a hung state, the failover doesnt happen till this node comes back alive totally.The rebooted node does take considerable amount of time to come back to a fully operational state.Once thats done, the other nodes are ok too.

Anyone here come across such a situtation?Appreciate any help/suggestions :)

Re: Polyserve Cluster unexpected reboots!

If the node is fenced and if iLO fencing (vs. SAN fencing), then it will be rebooted. That is the expected behavior to protect the integrity of the data. What is not expected is the hang of every other node.

If this was my cluster, I would contact HP Support and have them review the data from HPS reports (our data, log collector) from each node to review. Otherwise we are just guessing. Yes, blades take a long time to reboot, however if the node is fenced, it will NOT affect the other nodes.

Some guesses:1) Is the underlying HW firmware current or one rev back from all your blades and enclosure. Includes HBA, Broadcom, ProCurve switches, etc? Recommend HP Firmware Maintenance CD and other firmware requirements be close to current for BL460c G6 at drivers section at www.hp.com

2) Are your HP drivers current or one rev back for PSP. Recommend you download current PSP from www.hp.com, Support and Drivers, BL460c G6.

Re: Polyserve Cluster unexpected reboots!

I had a similar issue too on my 3.6.1 cluster. I verified NIC settings, flow control, etc per the documentation and HP support. What it ended up being was a self-inflicted problem...before we moved to Polyserve we had implemented a procedure to gzip our SQL backups via a DOS script every night (we get charged per Gig). Anywho - on Polyserve it would fence occassionally when the DOS script tried to compress a file or LUN that was currently in use. Not sure of the specifics there - just the DOS script was the common denominator. On a side note, we now use Litespeed for compression and have no issues.

0
Kudos

The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.