More Web Proxy on the site http://driver.im/

article

Coding techniques for handling failures in large disk arrays

Authors:

L. Hellerstein,

D. A. PattersonAuthors Info & Claims

Algorithmica, Volume 12, Issue 2-3

Pages 182 - 208

https://doi.org/10.1007/BF01185210

Published: 01 September 1994 Publication History

Abstract

A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.

References

[1]

Array Technology Corporation, Product Description, RAID + Series Model RX, Revision 1.0, Array Technology Corporation, Boulder, CO 80301, February 1990.

[2]

Bates, K. H., Performance aspects of the HSC controller, Digital Tech. J., Vol. 8, February 1989, pp. 25-37.

[3]

Bell, C. G., Multis: a new class of multiprocessor computers, Science, Vol. 228, April 1985, pp. 462-467.

[4]

Bell, C. G., The future of high performance computers in science and engineering, Comm. ACM, Vol. 32, No. 9, September 1989, pp. 1091-1101.

Digital Library

[5]

Bitton, D., and J. Gray, Disk shadowing, Proc. 14th Internat. Conf. on Very Large Data Bases (VLDB), 1988, pp. 331-338.

Digital Library

[6]

Bollobas, B., Combinatorics, Set Systems, Hypergraphs, Families of Vectors, and Combinatorial Probability, Cambridge University Press, Cambridge, 1986.

Digital Library

[7]

Boral, H., and D. DeWitt, Database machines: an idea whose time has passed?, Database Machines, H. O. Leilich and M. Missikoff, eds., Springer-Verlag, New York, September 1983, pp. 166-187.

[8]

Brouwer, A. E., Wilson's theory, in Packing and Covering in Combinatorics, A. Schrijver, ed., Mathematical Centre Tracts 106, Mathematisch Centrum, Amsterdam, 1979, pp. 75-88.

[9]

Brouwer, A. E., and A. Schrijver, Uniform hypergraphs, in Packing and Covering in Combinatorics, A. Schrijver, ed., Mathematical Centre Tracts 106, Mathematisch Centrum, Amsterdam, 1979, pp. 39-73.

[10]

Chen, P. M., and D. A. Patterson, Maximizing performance in a striped disk array, Proc. 1990 ACM SIGARCH 17th Ann. Internat. Symp. on Computer Architecture, Seattle, WA, May 1990, pp. 322-331.

Digital Library

[11]

Gelsinger, P. P., P. A. Gargini, G. H. Parker, and A. Y. C. Yu, Microprocessors circa 2000, IEEE Spectrum, October 1989, pp. 43-74.

Digital Library

[12]

Gibson, G. A., Redundant Disk Arrays: Reliable, Parallel Secondary Storage, M.I.T. Press, Cambridge, MA, 1992.

Digital Library

[13]

Gibson, G. A., and D. A. Patterson, Designing disk arrays for high data reliability, J. Parallel Distrib. Comput., Vol. 17, No. 1, January 1993, pp. 4-27.

Digital Library

[14]

Gray, J., Why do computers stop and what can be done about it?, Tandem Technical Report 85.7, June 1985.

[15]

Hanani, H., The existence of construction of balanced incomplete block designs, Ann. of Math. Statist., Vol. 32, 1961, pp. 361-386.

[16]

Hanani, H., Balanced incomplete block desings and related designs, Discrete Math., Vol. 11, 1975, pp. 255-369.

Digital Library

[17]

Hoagland, A., Information storage technology: a look at the future, IEEE Trans. Computer, Vol. 18, July 1985, pp. 60-67.

Digital Library

[18]

Holland, M., and G. A. Gibson, Parity declustering for continuous operation in redundant disk arrays, Fifth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), Boston, MA, October 1992, pp. 23-25.

Digital Library

[19]

Jilke, W., Disk array mass storage sytems: the new opportunity, Amperif Corp., September 1986.

[20]

Kim, M. Y., Synchronously Interleaved Disk Systems with their Application to the Very Large FFT, Ph.D. Dissertation, Polytechnic University, January 1987.

Digital Library

[21]

Klietz, A., J. Turner, and T. C. Jacobson, TurboNFS: fast shared access for Cray disk storage, Proc. Cray User Group Convention, April 1988.

[22]

Kryder, M. H., Data storage in 2000--trends in data storage technologies, IEEE Trans. Magnetics, Vol. 25, No. 6, November 1989, pp. 4358-4363.

[23]

Lin, Ting-Ting Yao, Design and Evaluation of an On-Line Predictive Diagnostic System, Ph.D. Dissertation, Carnegie Mellon University, April 1988.

Digital Library

[24]

Livny, M., S. Khoshafian, and H. Boral, Multi-disk management algorithms, Proc. ACM SIGMETRICS, May 1987, pp. 69-77.

Digital Library

[25]

MacWilliams, F. J., and N. J. A. Sloane, The Theory of Error-Correcting Codes, North-Holland Mathematical Library, Vol. 16, Elsevier Science, New York, 1977.

[26]

Muntz, R. R., and J. C. S. Lui, Performance analysis of disk arrays under failure, Proc. 16th Internat. Conf. on Very Large Data Bases (VDLB), D. McLeod, R. Sacks-Davis, and H. Schek, eds., Morgan Kaufman, Los Altos, CA, August 1990, pp. 162-173.

Digital Library

[27]

Myers, G. J., A. Y. C. Yi, and D. L. House, Microprocessor technology trends, Proc. IEEE, Vol. 74, No. 12, December 1986, pp. 1605-1622.

[28]

Newberg, L., and D. Wolfe, String layouts for a redundant array of inexpensive disks, Algorithmica, this issue, pp. 209-224.

Digital Library

[29]

Patterson, D. A., G. A. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID), ACM SIGMOD 88, Chicago, June 1988, pp. 109-116.

Digital Library

[30]

Peterson, W. W., and E. J. Weldon, Jr., Error-Correcting Codes, 2nd edn., M.I.T. Press, Cambridge, MA, 1972, pp. 131-136.

[31]

Rabin, M. O., Efficient dispersal of information at security, load balancing, and fault tolerance, J. Assoc. Comput. Mach., Vol. 36, 1989, pp. 335-348.

Digital Library

[32]

Rubinstein, R. Y., Simulation and the Monte Carlo Method, Wiley, New York, 1981.

Digital Library

[33]

Salem, K., and H. Garcia-Molina, Disk striping, Proc. IEEE 1986 lnternat. Conf. on Data Engineering, 1986, pp. 336-342.

Digital Library

Cited By

Zhang YYang JMemaripour ASwanson S(2015)MojimACM SIGARCH Computer Architecture News10.1145/2786763.269437043:1(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2786763.2694370
Zhang YYang JMemaripour ASwanson S(2015)MojimACM SIGPLAN Notices10.1145/2775054.269437050:4(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2775054.2694370
Zhang YYang JMemaripour ASwanson SOzturk OEbcioglu KDwarkadas S(2015)MojimProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694370(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2694344.2694370
Show More Cited By

Coding techniques for handling failures in large disk arrays
1. Applied computing
  1. Enterprise computing
    1. Enterprise information systems

Recommendations

Higher reliability redundant disk arrays: Organization, operation, and coding

Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by ...
Architectures and algorithms for on-line failure recovery in redundant disk arrays
Special issue on disk arrays
HPDA: A hybrid parity-based disk array for enhanced performance and reliability

Flash-based Solid State Drive (SSD) has been productively shipped and deployed in large scale storage systems. However, a single flash-based SSD cannot satisfy the capacity, performance and reliability requirements of the modern storage systems that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Algorithmica

Algorithmica Volume 12, Issue 2-3

September 1994

176 pages

ISSN:0178-4617

Issue’s Table of Contents

Copyright © Copyright © 1994 Springer-Verlag New York Inc.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 September 1994

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YYang JMemaripour ASwanson S(2015)MojimACM SIGARCH Computer Architecture News10.1145/2786763.269437043:1(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2786763.2694370
Zhang YYang JMemaripour ASwanson S(2015)MojimACM SIGPLAN Notices10.1145/2775054.269437050:4(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2775054.2694370
Zhang YYang JMemaripour ASwanson SOzturk OEbcioglu KDwarkadas S(2015)MojimProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694370(3-18)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2694344.2694370
Vandendriessche P(2015)On small line sets with few odd-pointsDesigns, Codes and Cryptography10.1007/s10623-014-9920-175:3(453-463)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s10623-014-9920-1
Lin SWang GStones DLiu XLiu J(2010)T-codeIEEE Journal on Selected Areas in Communications10.5555/1734877.173489528:2(289-296)Online publication date: 1-Feb-2010
https://dl.acm.org/doi/10.5555/1734877.1734895
Moussa RCheung DSong IChu WHu XLin J(2009)HDDBrs middleware for implementing highly available distributed databasesProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646308(2075-2076)Online publication date: 2-Nov-2009
https://dl.acm.org/doi/10.1145/1645953.1646308
Vitter J(2008)Algorithms and data structures for external memoryFoundations and Trends® in Theoretical Computer Science10.1561/04000000142:4(305-474)Online publication date: 1-Jan-2008
https://dl.acm.org/doi/10.1561/0400000014
Thomasian AFu GHan C(2007)Performance of Two-Disk Failure-Tolerant Disk ArraysIEEE Transactions on Computers10.1109/TC.2007.104156:6(799-814)Online publication date: 1-Jun-2007
https://dl.acm.org/doi/10.1109/TC.2007.1041
Sobe PPeter K(2006)Construction of efficient or-based deletion-tolerant coding schemesProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898890(343-343)Online publication date: 25-Apr-2006
https://dl.acm.org/doi/10.5555/1898699.1898890
Litwin WMoussa RSchwarz T(2005)LH*RS---a highly-available scalable distributed data structureACM Transactions on Database Systems10.1145/1093382.109338630:3(769-811)Online publication date: 1-Sep-2005
https://dl.acm.org/doi/10.1145/1093382.1093386
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents