On Sun, 10 Oct 2004 12:10 pm, Jason Dixon wrote:
> My Dell 2550 became unresponsive to console or socket input a short
> time ago. Accessing the DRAC-II via web console, I saw the following
> errors:
>> scsi : aborting command due to timeout : pid 5939736, scsi2, channel
0,
> id 0, lun 0 Write (10) 00 01 21 9a e1 00 00 0a 00
> ( ... more similar ... )
> SCSI host 2 reset (pid 5939739) timed out again -
> probably an unrecoverable SCSI bus or device hang.
> ( ... more aborted commands ... )
>> At this point, I had no choice but to power cycle it.
I had similar messages in the logs of a PE2650
with Perc3/DC 7 Disk Raid 5 array running Suse Pro 8.1.
Luckily the hardware was due for upgrading anyway
so I immediately copied everything across and put
the replacement machine and array into production.
I was glad I did because in the course of debugging it
under instruction from the Dell tech we lost the lot.
By way on an appology, Dell replaced the card, SCSI cable
and 2 disks, so I don't know just where the problem was.
That machine is back in production as spinning backup
and all arrays now have hot spares in them,
but I would love to know how to monitor
both the logical volume (degraded, disk swapped out, etc)
and the individual disk SMART info
without dropping down to BIOS.
michaelj
--
Michael James michael.james at csiro.au
System Administrator voice: 02 6246 5040
CSIRO Bioinformatics Facility fax: 02 6246 5166