I'm building a linux traffic shaping routing box for a leased line connection. This box has to be reliable, so I'm planning on installing the operating system and data files on a RAID 1 configuration. My question is twofold:

I was going to use linux software raid. I have heard that hardware RAID really only gives any significant benefits once you shell out for a good quality RAID card - with a battery backup at least. I'm not too concerned with speed - I won't be using the disks much at all, but I am concerned with data recovery in case of an accident. Is shelling out for a hardware raid card worth the price? If so, can anyone recommend a card to use?

I've been in the situation before where hard disks bought at the same time failed at roughly the same time as well - i.e., within a week of each other. In order to avoid this, I'd like to populate my RAID array with disks from different manufacturers. As long as the disk sizes, speeds, and cache sizes are the same, can anyone see a problem with this?

5 Answers
5

1) Linux Software RAID is very mature these days, and removing drives from one machine and placing in another will work every time. With a hardware solution, you need to get a spare card because that particular chip's way of doing RAID may not be the same as another, and you may have lost your data. With modern CPUs, software RAID is safe to use and quick too - I'd trust it moreso than a hardware solution unless you've got the budget for a high-end RAID card. The benefits of these is that they have the battery backup units which store data in the case of a power outage. Typically, you're not really going to be affected by power outages though - the drives themselves tend to do caching as well, so you're going to lose some data anyway, just do Linux software RAID. Or ZFS - it's very nice, VERY safe, useful feature, but a different paradigm.

2) That'll be fine. As long as they're within about 1% of each other, you'll just get a RAID set of the smallest drive size. I do the same - I tend to stick with the same manufacturer, but get different build sets.

You shouldn't create your RAID array as large as possible - knock off a half a percent or some such then if a drive dies and the replacement you buy turns out to be that little bit smaller it won't matter.
–
David SpillettSep 11 '09 at 11:50

5

To increase availability further with Linux Software RAID you can add a hotspare to the setup. This will let your system have a spare drive and if one fails the mirror will start to rebuild using the spare.
–
3dinfluenceSep 11 '09 at 11:56

1

If you go the software raid route, make sure the boot loader gets installed to both disks. (You'll probably need to install it on the second disk by hand.) A lot of distros just install it on the first drive adding more pain to mix if that's the drive that dies.
–
DavidSep 11 '09 at 12:16

"the drives themselves tend to do caching as well, so you're going to lose some data anyway" If you use a BBU turn OFF the drives cache. Otherwise why did you shell out the bucks for the BBU?
–
toppledwagonSep 11 '09 at 21:52

There are a couple of advantages to hardware RAID, which may or may not be worth what the cards cost:

You have to be careful when setting up software RAID to make sure the system boots off of both drives and works the same way if the primary one fails. It's easy to forget to put a working MBR on the secondary drive at the beginning, and there's the potential for a non-RAID boot partition to get out of sync between the two drives if you're not careful. Hardware RAID cards are much easier to get this right, so that your system will always work if there's a failure.

When drives fail, they can make the whole system go crazy in the process if they're spewing garbage. Your motherboard is probably not tested for how it acts in this situation. Hardware RAID controllers tend to act more sanely to wall off the bad drive and ignore what it's doing. More than once I've had a failed drive in a Linux software RAID setup take out the whole system until it was removed--no data loss, but the server had to go down for a bit until I could figure out which was the bad one. Hint: always write down all the drive serial numbers after you setup the array, so it's easier to figure out which one you lost when it stops working.

Replacing a failed drive on a hardware RAID system will usually be sufficient to kick off rebuilding, whereas you have to add the new drive yourself in Linux software RAID. In general, it's easier to get hardware RAID going, the learning curve is easier. One can argue the software RAID solutions is more powerful as a result of its complexity, sometimes people just want to replace the bad disk and move on.

If you have an application that calls fsync to force data out to the drives, this can be accelerated by a hardware RAID card using its cache in a way that you safely can't do any other way. The write gets handed over to the battery-backed cache, the app moves on, and even if power is lost that write is still safe (within the boundaries of how long the battery lasts). Typically databases are the main app that do this, it might happen to mail or log data too. Being able to cache writes like this can dramatically speed up performance, both by stopping the wait for fsync and for reordering writes so that there's less seeking over the physical disk. But if you don't have an app that requires it, this sort of thing isn't valuable. The OS will cache writes and spool them out in a way that reduces writes too if you don't force them out, and most applications don't rely on those writes making their way to disk in all cases--you just lose the last bit of data and move on.

The main disadvantage to hardware RAID, beyond cost, is that you can end up in the situation where taking a drive out of the server won't give you one you can use in another server that doesn't have the same RAID card. There's a full discussion of that issue in another answer here.

As for vendors, the RAID cards from 3ware have the best Linux support, just make sure the tw_cli program they provide is compatible with your system. I've never had a problem on mainstream Linux versions/hardware but it's something to check. The cards from Areca are a bit faster but their management software sucks, you need to get one of the models where the management interface is provided over a network connection before it's useful.

Different sized disks will be no problem in hardware or software RAID, as long as you're careful to use the size of the smallest drive everywhere. You might want to round down a bit on the size so that it improves the odds of a replacement of similar size from being usable.

All of the answers here are pretty good. One thing that you might want to consider, particularly if you have UPS power, is to use a minimalist install and load the whole system into RAM.

In that case, the hard disk is essentially just a place to persist configuration data. Check out Puppy Linux. A friend did this for a project a few years ago and actually kept a system with a failed hard disk running for about a year!

For #2, although Linux RAID will automatically drop to the lowest size for your set of disks, you may want to make your RAID partition a few GB smaller than that in case you buy a replacement disk which is a little smaller.

You can use the extra space on the disks as swap. (RAID the swap if you want the machine to survive a disk failure without crashing!)