Suresh Kumar Pakalapati's Linux Administration: RAID 50

Tuesday, August 3, 2010

RAID 50

RAID 50 is an often overlooked RAID level that can bridge the gap when it comes to choosing between RAID 5, RAID 6, and RAID 10. Scott Lowe explains why he likes RAID 50.

———————————————————————————————————

RAID 50 is my favorite RAID level. Although RAID 50 support is not in every product (for example, my EMC AX4 at Westminster College does not support RAID 50), I find that RAID 50 provides a great balance between storage performance, storage capacity, and data integrity that’s not necessarily found in other RAID levels.

If you haven’t used RAID 50 before, you’re in for a treat. As one of the many multilevel RAID options that are out there, RAID 50 operates by striping (RAID 0) data across multiple RAID 5 sets (Figure A).

Figure A

RAID 50 diagram

As you can see in the diagram, there are three RAID 5 sets that span a total of 12 disks. Each RAID 5 set has four disks, with one disk’s worth of capacity dedicated to parity information. For the example above, this means that each RAID set will lose 25% of its total capacity to parity information, as would be the case if you were to deploy a single four-disk RAID 5 set. The beauty of RAID 50 lies in the “0″ part of the RAID level; this is where information is striped across each of those underlying individual RAID 5 sets.

There are a number of reasons why I like RAID 50, but there are also tradeoffs to using this RAID level. Here are some pros and cons about using RAID 50.

Disk space

RAID 5 requires 1/#disks worth of space per RAID array. In Figure A, this would mean that, if all 12 disks were in a single RAID 5 set, you’d be left with 11 disks worth of capacity. With RAID 50, you need to allocate one disk per underlying array for parity, so you’re left with less usable space than you would have if you simply used RAID 5.

However, if you compare RAID 50 and RAID 10, you’ll see a clear winner in RAID 50 from a capacity perspective. With RAID 10, you always lose 50% of your capacity due to mirroring. Since each underlying RAID 5 array requires a minimum of three disks (RAID 5 rules), and you lose the capacity of one disk to parity, you’ll never “lose” more than 33% of your total capacity when using RAID 5. As you make each RAID 5 set larger, this loss percentage goes down. In Figure A, with four disks used in each RAID 5 set, 25% of capacity is used for parity overhead; if you make that five disks per RAID 5 set, this percentage drops to 20%. As this percentage drops, your risk increases.

RAID 50 requires an array with at least six disks — two RAID 5 arrays of three disks each. I like to use three or four disk RAID 5 sets in RAID 50 arrays.

Risk

With RAID 5, as you increase the number of disks in the array, you increase the likelihood that you’ll experience total array failure as more than one drive fails at the same time. As you move into RAID 50 territory, that additional disk space that you’re giving up translates directly into lowered risk, as RAID 50 systems can suffer multiple disk faults — as long as those disk faults happen in the right places.

With RAID 50, if you suffer multiple disk faults in any of the underlying RAID 5 arrays, the entire RAID 50 is toast; however, each individual RAID 5 array can withstand the loss of a single disk. You never want to have more than one disk go bad at a time regardless of RAID configuration, but at least with RAID 50, your chances are much better that a second disk failure will not happen in the same array as the first failure. This is one reason that keeping the individual RAID 5 arrays small (three or four disks at most) makes a lot of sense. The more disks you add to the individual RAID 5 arrays, the higher your risk for suffering a dual disk loss in one array.

Remember, the “0″ part of RAID 50 offers no fault tolerance; all fault tolerance happens at the individual RAID 5 level. The RAID 0 part does help with performance.

Performance

RAID 50 does not perform as well as RAID 10 when it comes to performance in a degraded state (i.e., during a rebuild), but RAID 50, at least theoretically, performs much better than RAID 5 in overall write performance; this places RAID 50 between RAID 10 (the winner in performance) and RAID 5 (sometimes lackluster performance, depending on workload) in the performance spectrum. Actual performance usually depends on the choice of RAID controller and the kind of information being processed.

Like RAID 10 and RAID 5, RAID 50 provides excellent read performance.

Summary

When it comes to achieving a balance between storage cost, risk, and performance, few RAID levels go as far as RAID 50 for the following reasons:

Storage. Although RAID 50 uses more overhead space than RAID 5, it requires much less overhead than RAID 10, making it a nice in between choice.
Risk. With RAID 5 alone, organizations run the risk of a second disk failure that could compromise the entire array. RAID 50 mitigates this issue since multiple disks can fail, as long as the disks are the right ones.
Performance. Although overall read/write performance is highly dependent on a number of factors, RAID 50 should provide better write performance than RAID 5 alone.