research-article

Higher reliability redundant disk arrays: Organization, operation, and coding

Authors:

Alexander Thomasian,

Mario BlaumAuthors Info & Claims

ACM Transactions on Storage (TOS), Volume 5, Issue 3

Article No.: 7, Pages 1 - 59

https://doi.org/10.1145/1629075.1629076

Published: 30 November 2009 Publication History

Get Access

Abstract

Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by exclusive-ORing the corresponding blocks on surviving disks. RAID5 can mask a single disk failure, and it is vulnerable to data loss if a second disk failure occurs. The RAID5 rebuild process systematically reconstructs the contents of a failed disk on a spare disk, returning the system to its original state, but the rebuild process may be unsuccessful due to unreadable sectors. This has led to two disk failure tolerant arrays (2DFTs), such as RAID6 based on Reed-Solomon (RS) codes. EVENODD, RDP (Row-Diagonal-Parity), the X-code, and RM2 (Row-Matrix) are 2DFTs with parity coding. RM2 incurs a higher level of redundancy than two disks, while the X-code is limited to a prime number of disks. RDP is optimal with respect to the number of XOR operations at the encoding, but not for short write operations. For small symbol sizes EVENODD and RDP have the same disk access pattern as RAID6, while RM2 and the X-code incur a high recovery cost with two failed disks. We describe variations to RAID5 and RAID6 organizations, including clustered RAID, different methods to update parities, rebuild processing, disk scrubbing to eliminate sector errors, and the intra-disk redundancy (IDR) method to deal with sector errors. We summarize the results of recent studies of failures in hard disk drives. We describe Markov chain reliability models to estimate RAID mean time to data loss (MTTDL) taking into account sector errors and the effect of disk scrubbing. Numerical results show that RAID5 plus IDR attains the same MTTDL level as RAID6, while incurring a lower performance penalty. We conclude with a survey of analytic and simulation studies of RAID performance and tools and benchmarks for RAID performance evaluation.

References

[1]

Alvarez, G. A., Burkhard, W. A., and Cristian, F. 1997. Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA'97). 62--72.

Abstract

References

Cited By

Index Terms

Recommendations

On Variable Scope of Parity Protection in Disk Arrays

Double Parity Sparing for Improvement of Performance and Reliability in Disk Arrays

Performance of Two-Disk Failure-Tolerant Disk Arrays

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations