User Mailing

ICHEC mail #19

Posted: 2006-03-27

Dear ICHEC users,

We would like to inform you that user service on walton has now resumed.

This interruption in service was caused by a failure of the DS4500 storage controller which no longer exported the RAID containers on which GPFS stores its data. This problem was reported at 11:40am on Saturday 25th.

An IBM engineer has been brought on site over the week-end for problem determination. The problem was traced to an older version of the firmware installed on one of our disk trays. The necessary upgrades have now been carried out, and the GPFS filesystems are operational again.

Unfortunately, this hardware failure also wiped out the queueing system of all pending jobs. In other words, all jobs which were running or pending at the time of the failure will need to be re-submitted.