Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux - A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.

Hmmmm ... messy. I lost a hardware RAID1 on a Promise controller (yes I know, I should have known better ... hindsight ... blah!). Burned hand teaches best so they say - I did quite a bit of testing on soft RAID1 and RAID5. Could not get soft RAID5 to be acceptable after pulling power cables out to check the results. RAID5 doesn't like it very much and has a tendency to refuse to mount the volume because the filesystem isn't clean and the RAID is still critical. I go for soft RAID1 or hardware RAID5.

If you've lost two drives, then the RAID set is dead. There are some specialist tools which can attempt to recover some data from a multiple disk RAID5 failure, but given the way the data is written, I wouldn't be too hopeful about what that would get back.

A multiple drive failure at one time is quite rare. Not unheard of though. Assuming that hdg and hdh are master and slave on a single IDE bus, it is possible that the failure of one of the drives is causing some weird bus errors and making it look like the other drive has problems too - I've seen that before. Could try removing each drive from the bus and trying to boot the system to see if either of the drives miraculously recovers, then replace the failed drive and rebuild the RAID.

Thanks for your recommendations. As you mentioned: burned hand teaches best. Your assumptions regarding the drives were correct. There are four drives connected. Two as masters and two as slaves on two IDE busses. Unplugging each device one by one and trying to start the array didn't succeed.

Also, though multiple drive failures are uncommon, if you purchased all the drives at the same time, when one fails the others are sure to follow. I've seen RAID 5 on hardware controllers die twice during rebuilds. Generally, when one drive fails, go through the cycle and swap every drive in the array, or you may be sorry.

Or get enough disks that RAID 6 makes sense. That is all we use at work now, it is RAID 5 with an additional hot spare. Most hardware controllers will allow the hot spare to be any of the physical drives in the array, so when one goes bad the hot spare takes its place, then you pull the bad drive out, put a blank drive in, and set it as the new hot spare. Much safer. I've yet to see a RAID 6 failure.

Software RAID 5 sounds like a very bad idea to me. I am aware that it is possible, but any data important enough to be on a RAID 5 array is also important enough that the additional $300 or so is spent on a hardware controller.

I had (am having) a similar problem with a Silicon Image 3124 PCI-X Serial ATA controller on as Norco DS-500 storage array. The sata_sil24 driver with port multiplier is still a bit experimental, and I've only been able to get to work with it is a patch on a 2.17.4. The controller timed out and two of 5 drives in a raid5 were lost. The drives were still good, but adding the "failed" drives would not work. I was however able to recover the raid array by re-creating ( mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[d-h]1 ). If the airflow ide-Ribbons were the problem, and the drives did not really "fail" (i.e., just in the array), this might work for you.

I got the following messages during the process:
mdadm: /dev/sdd1 appears to contain an ext2fs file system
size=1953535744K mtime=Thu May 17 22:24:08 2007
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007
mdadm: /dev/sde1 appears to be part of a raid array:
level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007
mdadm: /dev/sdf1 appears to be part of a raid array:
level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007
mdadm: /dev/sdg1 appears to be part of a raid array:
level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007
mdadm: /dev/sdh1 appears to contain an ext2fs file system
size=2005702402K mtime=Wed Nov 28 02:32:38 2007
mdadm: /dev/sdh1 appears to be part of a raid array:
level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007
Continue creating array?y
mdadm: array /dev/md0 started.

I had identical problems this evening with my Fedora 7 server, which has 4 drives hanging off the Nvidia SATA controller in a RAID 5 array. The console started spitting out "ATA: Abnormal Status" errors, then after rebooting the RAID 5 array would not mount. After attempting the maintennance/reassemble options with mdadm I had no success so I turned to Google and stumbled across this thread.

Thanks to fsbooks's reply I have successfully re-created the array and mounted it without any problems.

I am not sure what caused this problem. I am running Fedora 7 with kernel: 2.6.21-1.3228.fc7.

Would be interesting to see how many others have come across this. I had the same issues as fakeroot with 2 drives showing the "removed, faulty removed" status when examined with mdadm.

Echoing the previous comment: I have a media server with a lot of large files on it - too much to effectively backup until BluRay comes down in price a lot. So I had this bright idea about using software RAID. I purchased 2 more 500G SATA drives and created a RAID 5 array on them, with my original drive "missing". Then I copied my files onto it.

I verified that it survived rebooting. So far, so good. So I repartitioned my original drive and added it to the array. I left for work with it syncing nicely.

I came home to find that two drives had failed - probably a loose power cable (off a splitter from 1 IDE to 2 SATA) because reseating everything brought the drives back to life - but not the array.

Followed fsbook's advice using the two drives I'd originally setup, in case the sync hadn't completed, then added the third drive. My files are there and the drives are once again syncing.

If /dev/md0 raid partiton has already created is it possible to create /dev/md1 with another disks or partitions. If so
then why i'm getting this error below:[root@cjpunjabiradio ~]# mdadm -C /dev/md1 --level=5 --raid-devices=2 /dev/hda{14,15}
mdadm: error opening /dev/md1: No such file or directory

I've been having a problem with my RAID arrays (WD 250GB SATA drives) failing frequently (once per month?), mostly when under a large load (they max at about 5MB/sec). Two disks will simultaneously fail, but every time they fail, I'm able to recover by recreating the array. Any idea what this could be? Anyone else have this happen?

On another note, I've also been able to successfully move the array from one computer to another (this is a data RAID5 array, OS is installed on separate single drive) by using the create command. mdadm will recognize the drives as already being part of an array and will recover them with my data in tact.

Hmmmm ... messy. I lost a hardware RAID1 on a Promise controller (yes I know, I should have known better ... hindsight ... blah!). Burned hand teaches best so they say - I did quite a bit of testing on soft RAID1 and RAID5. Could not get soft RAID5 to be acceptable after pulling power cables out to check the results. RAID5 doesn't like it very much and has a tendency to refuse to mount the volume because the filesystem isn't clean and the RAID is still critical. I go for soft RAID1 or hardware RAID5.

This is incorrect and misleading. Software RAID5 and the filesystem you choose to mount on it are two entirely separate things, if the *filesystem* won't mount after the RAID is rebuilt then that's a filesystem issue, not a RAID one.

Generally, if the RAID has crashed then the filesystem will have a problem mounting, fsck the filesystem or switch to a journalled filesystem like ext3 to minimise that risk.

In my experience it's hardware RAID system which are harder to recover as you're limited to the tools available to you in the BIOS of the controllers vendor. With software RAID you're not quite so limited and in many cases, you can recover from situations where you'd be stuck if you were running hardware RAID.

(I've successfully recovered from a 2 disk failure in a software RAID5 array of 7 drives without losing much data, so it's certainly possible)

It's also *much easier* to be able to pull the disks out of a machine and drop them into an entirely different system running a different linux distribution and even a different architecture, ie: PPC to x86 or Sparc. Doing that with a hardware RAID card can cause driver issues and all sorts.

Software RAID is incredibly flexible.

Either spend lots of money on a hardware RAID5 controller with battery backed cache and a well trusted chipset or stick with Software RAID5. Anything else is a false economy.

Quote:

Originally Posted by ajg

If you've lost two drives, then the RAID set is dead. There are some specialist tools which can attempt to recover some data from a multiple disk RAID5 failure, but given the way the data is written, I wouldn't be too hopeful about what that would get back.

Not true at all, mdadm and fsck are usually all you'll need to recover from any sort of Software RAID5 issue in Linux. You will lose data, but depending on the amount of activity on the filesystem it can be surprisingly little.

Quote:

Originally Posted by ajg

A multiple drive failure at one time is quite rare. Not unheard of though. Assuming that hdg and hdh are master and slave on a single IDE bus, it is possible that the failure of one of the drives is causing some weird bus errors and making it look like the other drive has problems too - I've seen that before. Could try removing each drive from the bus and trying to boot the system to see if either of the drives miraculously recovers, then replace the failed drive and rebuild the RAID.

Or get enough disks that RAID 6 makes sense. That is all we use at work now, it is RAID 5 with an additional hot spare. Most hardware controllers will allow the hot spare to be any of the physical drives in the array, so when one goes bad the hot spare takes its place, then you pull the bad drive out, put a blank drive in, and set it as the new hot spare. Much safer. I've yet to see a RAID 6 failure.

Jim, I think you're confused.

What you're describing is just RAID5 with a hot-spare. You can achieve this with Software RAID5 under linux by defining one or more hot-spares. If a drive fails in the RAID5 set then the hot spare is automatically brought into the array and the array is rebuilt onto the hot-spare.

RAID6 is RAID5 with two parity blocks, rather than 1.

Quote:

Originally Posted by JimBass

Software RAID 5 sounds like a very bad idea to me. I am aware that it is possible, but any data important enough to be on a RAID 5 array is also important enough that the additional $300 or so is spent on a hardware controller.

Peace,
JimBass

That just sounds like you're confusing having a good backup strategy with volume/disk management strategy!

RAID5 or any level of RAID is not a replacement for a good back-up strategy.

Hardware controllers don't give you any additional resilience or safety for the given raid level unless they utilise battery backed cache or similar.

There is nothing intrinsically wrong with Software RAID5 and in many cases it can be more flexible and resilient than a hardware controller. In almost all instances it's a better and safer bet than a hardware RAID5 controller without battery backup.

Echoing the previous comments, hardware RAID controllers don't offer anything that software RAID doesn't have. They just move it to a controller instead of letting the OS handle it. In practical terms, this means you've got an extra piece of hardware that can mess up, along with some special drivers that aren't exactly mass market items.

Software RAID on the other hand needs no extra hardware. And the software RAID drivers don't have to handle as much as the hardware RAID controllers.

Some hardware RAID controllers do have a battery backup to allow them to save unwritten data but this is not a substitute for a decent UPS and proper shutdown during a power outage.

I have a similar problem - a raid 5 array with 11 drives and one drive sdh encountered problems. I tried to rebuild but sdb failed midway and my array was degraded.

I followed some advice to recover the data using mdadm -C /dev/md0 /dev/sd[efghiabcdkj]1 both using command line and webmin but the drive order sde[0], sdf[2], sdg[3].... sdk[9], sdj[10] was messed up and the array was reordered sda[0]... sdk[10]. I tried mounting and received a VFS: ext 3 file system not found...

I've tried for a week now to recover the data (which consists of personal data i saved over the last 20 yrs and work data i have spent the last 2 years working on) but to no avail. Any help is greatly appreciated. Thanks in advance.