DL380 disk fail continuously

Hi there,

I have a server that has 6 disks running off a smartarray 6402. One channel for 2 disks running C: - the other channel running 4 disks for other partitions. RAID 5 for the one logical drive that contains the various partitions.

I had a hardware fail on one of the c: drives - bunged in a new hard drive which rebuilt, and upon rebuild, failed again. I did this twice - first with a working disk from an old server and then with a brand new spare.(HP U320 146.4Gb)

I have tried using imaging software to grab the existing partitions of data off, but each time the app starts to read the partition data ( when it is about to offer me a choice of what I would like to backup) it fails and the server restarts.

I've tried smart start - but that hangs as its loading.

So, I have a situation where I need to get the existing partitions back online somehow. ( E / F / etc) - I can lose the c drive and rebuild but the physical disks are not staying alive in that physical slot.

( I managed to run ADU but have no floppy port to save report to)

I am wondering whether it could be the backplane instead of the disks, re putting in 2 disks and them both failing to rebuild.

I have an older DL380 G4 - would it be possible to build a fresh C drive in there - then add the existing other 4 disks - would they rebuild themselves / present themselves to the OS for use?

I can either move the existing 6402 to the older chassis or use another 6402 that I have spare.

I dont have enough experience in array technology to know how to keep the data safe..

Re: DL380 disk fail continuously

It looks like booting fails in general, with the C: drive and with the imaging sw (I assume this is self booting CD/DVD). It could be the backplane or even the system board. The raid configuration is stored on the disks so if you hook up the disks to another server with the same or different 6402 (preferably a different one to eliminate any problems with the old 6402), the controller will read the raid configuration from the disks and will bind the disks in the last saved raid configuration and you will see the partitions. They recommend to do a backup before starting any raid disk movement but in your case you can't run a backup.

Re: DL380 disk fail continuously

Thanks for that - I was hoping someone could reassure me that I can try dropping the disks into another chassis.

I think that it is a combo of the drive not being bootable any more and either the backplane being duff or some bad data on one of the disks that is stopping the rebuild safely. Guess the red light on the failing disk could be a red herring...

Re: DL380 disk fail continuously

> One channel for 2 disks running C: - the other channel running 4 disks for other partitions

> to recap - 6 disks configured RAID5, with 4 partitions. > One channel had 2 disks with c drive on it, the other channel had the remaining 4 drives with 3 data partitions on.

This can not be. A. Either you have two raid arrays one raid1 (two disks mirrored pair) for drive c and one 4-disk array in raid5 for dataB. Or all six disks are in raid5. In this case drive c is in all six disks and the rest of the partitions are on all six disks as well.

> I had a hardware fail on one of the c: drives From the behavior of the failure of drive C only, it looks like you have the first setup "A", I outlined above. You need to verify this.

Re: DL380 disk fail continuously

Maybe I am using the wrong terminology - the os would not boot - hence my referral to C drive.I managed to acronis an older image to the c partition a couple of days ago, and the server was up again. It tanked after around 4 hours though.

Thanks for going this with me, I can see now my logic was a bit off, re a disk failing and the server not booting, was leading me to think that the failed drive was "C" so to speak.

Putting a fresh disk in was what would be the solution I thought, but now that that fresh disk has failed, I am not sure what to do.

The first time I put the fresh disk in, I hit one of the F key combos, to put it into rom based utils, and then left it to rebuild, didnt actually use the utils to keep disk spin to a minimum. The drive stopped blinking green after a few hours, looking like it was rebuilt.

Re: DL380 disk fail continuously

In this case you would need all six disks to transfer the raid volume to another server. It definitely sounds it is isolated to ID 0 in scsi bus 1 (scsi port2), it could be the disk cage slot or it could still be the disks.

There may be something else wrong here, even with a failed disk, the raid5 volume should have stayed up and running. Might there be another failed disk or a problem with the controller?

Re: DL380 disk fail continuously

Thanks for clarification re the 6 disks, I realised that after your last reply cheers.

Could be the firmware as I upgraded raid card and hard drives a day before. Sometimes upon restart I get scsi device 3 and 0 reporting as have having failed and now online.

I can't keep a clear line of process - too many random errors - sometimes the eternal disk 0, randomly disk 3 ( this seems to occur when I try acronis which now always fails upon getting to the screen to show what you want to backup, i.e. reading the disks / partitions)

I'm thinking as a last attempt, taking all 6 disks and putting them into a g2 dl380 and seeing what happens. That will isolate the backplane being in error. Not sure whether to use as you said, a different controller or to keep existing one to just isolate the backplane.

I was thinking about trying to downgrade firmware on the raid controller ( if I can find out how to) as another thing to try.

I'm hating that all this troubleshooting is muddying the waters, i.e. trying so many separate paths is not really the best logical way to troubleshoot.