In any of the RAID levels that use striping, increasing the number of physical disks usually increases performance, but also increase the chance of any one disk in the set failing. I have this idea that I shouldn't use more than about 6-8 disks in a given RAID set but that's more just passed down knowledge and not hard fact from experience. Can anyone give me good rules with reasons behind them for the max number of disks in a set?

8 Answers
8

The recommended maximum number of disks in a RAID system varies a lot. It depends on a variety of things:

Disk technology SATA tolerates smaller arrays than SAS/FC does, but this is changing.

RAID Controller limits The RAID controller itself may have fundamental maximums. If it is SCSI based and each visible disk is a LUN, the 7/14 rule holds true. If it FibreChannel based, it can have up to 120 or more visible disks.

RAID Controller processor If you go with any kind of parity RAID, the CPU in the RAID card will be the limiter on how fast you can write data. There will be a fundamental maximum for the card. You'll see it when a drive fails in a RAID5/6 LUN, as the performance drop will affect all LUNs associated with the RAID card.

Bus bandwidth U320 SCSI has its own limits, as does FibreChannel. For SCSI keeping RAID members on different channels can enhance parallelism and improve performance, if the controller supports it.

For SATA-based RAID, you don't want to have more than about 6.5TB of raw disk if you're using RAID5. Go past than and RAID6 is a much better idea. This is due to the non-recoverable read error rate. If the size of the array is too large, the chances of a non-recoverable read error occurring during the array rebuild after a loss get higher and higher. If that happens, it's very bad. Having RAID6 greatly reduces this exposure. However, SATA drives have been improving in quality lately, so this may not hold true for much longer.

The number of spindles in an array doesn't really worry me over much, as it's pretty simple to get to 6.5TB with 500GB drives on U320. If doing that, it would be a good idea to put half of the drives on one channel and half on the other just to reduce I/O contention on the bus side. SATA-2 speeds are such that even just two disks transferring at max-rate can saturate a bus/channel.

SAS disks have a lower MTBF rate than SATA (again, this is beginning to change) so the rules are less firm there.

There are FC arrays that use SATA drives internally. The RAID controllers there are very sophisticated, which muddies the rules of thumb. For instance, the HP EVA line of arrays groups disks into 'disk groups' on which LUNs are laid out. The controllers purposefully place blocks for the LUNs in non-sequential locations, and perform load-leveling on the blocks behind the scenes to minimize hot-spotting. Which is a long way of saying that they do a lot of the heavy lifting for you with regards to multiple channel I/O, spindles involved in a LUN, and dealing with redundancy.

Summing up, failure rates for disks doesn't drive the rules for how many spindles are in a RAID group, performance does. For the most part.

If you are looking for performance, then it is important to understand the interconnect that you're using to attach the drives to the array. For SATA or IDE, you will be looking at 1 or 2 per channel, respecitvely (assuming that you are using a controller with independent channels). For SCSI, this depends heavily on the bus topology. Early SCSI had a device limit of 7 device IDs per chain (aka. per controller), one of which had to be the controller itself, so you would have 6 devices per SCSI chain. Newer SCSI technologies allow for nearly double that number, so you would be looking at 12+. The key here is that the combined throughput of all drives can't exceed the capacity of the interconnect, otherwise your drives will be "idling' when they are at peak performance.

Today, things have changed a wee bit. The drives haven't advanced a lot in terms of performance, but the advancement seen is significant enough that performance tends not to be an issue unless you are working with "drive farms", in which case you're talking about an entirely different infrastructure and this answer/conversation is moot. What you will probably worry about more is data redundancy. RAID 5 was good in its heyday because of several factors, but those factors have changed. I think you'll find that RAID 10 might be more to your liking, as it will provide additional redundancy against drive failures while increasing read performance. Write performance will suffer slightly, but that can be mitigated through an increase in active channels. I would take a 4-drive RAID 10 setup over a 5-drive RAID 5 setup any day, because the RAID 10 setup can survive a (specific case of) two-drive failure, whereas the RAID 5 array would simply roll over and die with a two-drive failure. In addition to providing slightly better redundancy, you can also mitigate the "controller as a single point of failure" situation by splitting the mirror into two equal parts, with each controller handling just the stripe. In the event of a controller failure, your stripe will not be lost, only the mirror effect.

I use 7 as a "magic" maximum number. For me, it's a good compromise between space lost for redundancy (n this case, ~14%) and time to rebuild (even if the LUN is available while rebuilding) or increase size, and MTBF.

Obviously, this has worked great for me when working with SAN 14-disk enclosures. Two of our clients had 10-disk enclosures, and the magic number 7 was reduced to 5.

All-in-all, 5-7 has worked for me. Sorry, no scientific data from me either, just experience with RAID systems since 2001.

The limit of disks in a RAID used to be determined by the number of devices on a SCSI BUS. Up to 8 or 16 devices can be attached to a single bus and the controller counted as one device - so it was 7 or 15 disks.

Hence alot of RAIDs were 7 disks (one was a hot spare) so that meant 6 disks left - or 14 disks with 1 hot spare.

So the biggest thing about disks in a RAID group is probably how many IOPS you need.

For example a 10k RPM SCSI disk may run around 200 IOPS - if you had 7 of them in a RAID 5 - you would lose 1 disk for parity but then have 6 disks for read/writes and a theoretical maximum of 1200 IOPS - if you needed more IOPS - add more disks (200 IOPS per disk).

And the faster disks 15k RPM SAS may go up to 250 IOPS, etc.

And then there is always SSD (30,000 IOPS per disks) and they are raidable (albeit really expensive).

And I think SAS has a crazy maximum value for number of devices - like 16,000 drives

With RAID6 and SATA, I've had good success with 11 disks... And one hot-spare (some bad controllers will need two hot-spares to do a rebuild of RAID6). This is convenient since many JBOD come in groups of 12 disks like the HP MSA60.