[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

Published: 01 March 2013 Publication History

Abstract

We design flexible schemes to explore the tradeoffs between storage space and access efficiency in reliable data storage systems. Aiming at this goal, two new classes of erasure-resilient codes are introduced -- Basic Pyramid Codes (BPC) and Generalized Pyramid Codes (GPC). Both schemes require slightly more storage space than conventional schemes, but significantly improve the critical performance of read during failures and unavailability.
As a by-product, we establish a necessary matching condition to characterize the limit of failure recovery, that is, unless the matching condition is satisfied, a failure case is impossible to recover. In addition, we define a maximally recoverable (MR) property. For all ERC schemes holding the MR property, the matching condition becomes sufficient, that is, all failure cases satisfying the matching condition are indeed recoverable. We show that GPC is the first class of non-MDS schemes holding the MR property.

References

[1]
Abd-El-Malek, M., W. V. C. Ii, Cranor, C., Ganger, G. R., Hendricks, J., Klosterman, A. J., Mesnier, M., Prasad, M., Salmon, B., Sambasivan, R. R., Sinnamohideen, S., Strunk, J. D., Thereska, E., Wachs, M., and Wylie, J. J. 2005. Ursa minor: Versatile cluster-based storage. In Proceedings of USENIX Conference on File and Storage Technologies.
[2]
Aguilera, M. K., Janakiraman, R., and Xu, L. 2005. Using erasure codes for storage in a distributed system. In Proceedings of IEEE International Conference on Dependable Systems and Networks. IEEE, Los Alamitos, CA.
[3]
Blahut, R. E. 2003. Algebraic Codes for Data Transmission. Cambridge University Press.
[4]
Blaum, M. and Roth, R. M. 1999. On lowest-density MDS codes. IEEE Trans. Inf. Theory.
[5]
Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans.Computers.
[6]
Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., and Zuckerman, D. 1995. An XOR-based erasure-resilient coding scheme. Tech. rep. TR-95-048, ICSI, Berkeley, CA.
[7]
Borthakur, D., Schmidt, R., Vadali, R., Chen, S., and Kling, P. 2010. HDFS RAID. Hadoop User Group Meeting.
[8]
Cadambe, V. R., Huang, C., and Li, J. 2011. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems. In Proceedings of IEEE International Symposium on Information Theory. IEEE, Los Alamitos, CA.
[9]
Cadambe, V. R., Huang, C., Li, J., and Mehrotra, S. 2011. Polynomial length MDS codes with optimal repair in distributed storage. In Proceedings of the Asilomar Conference on Signals, Systems and Computers.
[10]
Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., Haridas, J., Uddaraju, C., Khatri, H., Edwards, A., Bedekar, V., Mainali, S., Abbasi, R., Agarwal, A., Ul Haq, M. F., Ul Haq, M. I., Bhardwaj, D., Dayanand, S., Adusumilli, A., McNett, M., Sankaran, S., Manivannan, K., and Rigas, L. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York.
[11]
Chang, F., Ji, M., Leung, S.-T., Maccormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-effective disaster tolerance. In Proceedings of the USENIX Conference on File and Storage Technologies.
[12]
Chen, M., Huang, C., and Li, J. 2007. On the maximally recoverable property for multi-protection group codes. In Proceedings of the IEEE International Symposium on Information Theory.
[13]
Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID -- High-performance, reliable secondary storage. ACM Comput. Surv.
[14]
Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., and Yianilos, P. 1999. A prototype implementation of archival intermemory. In Proceedings of the ACM Conference on Digital Libraries. ACM, New York.
[15]
Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the USENIX Conference on File and Storage Technologies.
[16]
Dimakis, A. G., Godfrey, P. B., Wu, Y., Wainwright, M., and Ramchandran, K. 2010. Network coding for distributed storage systems. IEEE Trans. Inf. Theory.
[17]
Dingledine, R., Freedman, M. J., and Molnar, D. 2000. The free haven project: Distributed anonymous storage service. In Proceedings of the Workshop on Design Issues in Anonymity and Unobservability.
[18]
Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. Hydrastor: A scalable secondary storage. In Proceedings of the USENIX Conference on File and Storage Technologies.
[19]
Fikes, A. 2010. Storage architecture and challenges. Google Faculty Summit.
[20]
Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V.-A., Barroso, L., Grimes, C., and Quinlan, S. 2010. Availability in globally distributed storage systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation.
[21]
Gallager, R. G. 1963. Low-density parity-check codes. MIT Press, Cambridge, MA.
[22]
Gantenbein, D. 2012. A better way to store data. Microsoft Res. Featured Stories. http://research.microsoft.com/en-us/news/features/erasurecoding-090512.aspx.
[23]
Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York.
[24]
Gopalan, P., Huang, C., Simitci, H., and Yekhanin, S. 2011. On the locality of codeword symbols. In Proceedings of the Allerton Conference on Communication, Control, and Computing.
[25]
Greenan, K. M., Li, X., and Wylie, J. J. 2010. Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs. In Proceedings of the IEEE Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA.
[26]
Greenan, K. M., Long, D. E., Miller, E. L., Schwarz, T. J. E., and Wylie, J. J. 2008. A spin-up saved is energy earned: Achieving power-efficient, erasure-coded storage. In Proceedings of the USENIX Workshop on Hot Topics in System Dependability.
[27]
Greenan, K. M., Miller, E., and Wylie, J. J. 2008. Reliability of flat XOR-based erasure codes on heterogeneous devices. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.
[28]
Grolimund, D. 2007. P2P online storage. In Proceedings of the Web 2.0 Expo.
[29]
Haeberlen, A., Mislove, A., and Druschel, P. 2005. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation.
[30]
Hafner, J. L. 2005. Weaver codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies.
[31]
Hafner, J. L. and Rao, K. 2006. Notes on reliability models for non-MDS erasure codes. IBM Tech. rep. RJ10391.
[32]
Hafner, J. L., Deenadhayalan, V. W., Rao, K., and Tomlin, J. A. 2005. Matrix methods for lost data reconstruction in erasure codes. In Proceedings of the USENIX Conference on File and Storage Technologies.
[33]
Hamilton, J. 2007. An architecture for modular data centers. In Proceedings of the Conference on Innovative Data Systems Research.
[34]
Hosekote, D. K., He, D., and Hafner, J. L. 2007. REO: A generic RAID engine and optimizer. In Proceedings of the USENIX Conference on File and Storage Technologies.
[35]
Huang, C. and Xu, L. 2003. Fast software implementation of finite field operations. Tech. rep., Washington University, St. Louis, MO.
[36]
Huang, C. and Xu, L. 2008. Star: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Computers.
[37]
Huang, C., Chen, M., and Li, J. 2007. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the IEEE International Symposium on Network Computing and Applications. IEEE, Los Alamitos, CA.
[38]
Huang, C., Li, J., and Chen, M. 2007. On optimizing XOR-based codes for fault-tolerant storage applications. In Proceedings of the IEEE Information Theory Workshop. IEEE, Los Alamitos, CA.
[39]
Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. 2012. Erasure Coding in Windows Azure Storage. In Proceedings of the USENIX Annual Technical Conference.
[40]
Khan, O., Burns, R., Plank, J., and Huang, C. 2011. In search of I/O-optimal recovery from disk failures. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems.
[41]
Khan, O., Burns, R., Plank, J., Pierce, W., and Huang, C. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the USENIX Conference on File and Storage Technologies.
[42]
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. Oceanstore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems.
[43]
Luby, M. G., Mitzenmacher, M., Shokrollahi, A., and Spielman, D. A. 2001. Efficient erasure correcting codes. IEEE Trans. Inf. Theory.
[44]
Luo, J., Xu, L., and Plank, J. S. 2009. An efficient XOR-scheduling algorithm for erasure codes encoding. In Proceedings of the IEEE International Conference on Dependable Systems and Networks.
[45]
MacWilliams, F. J. and Sloane, N. J. A. 1977. The Theory of Error Correcting Codes. North-Holland, Amsterdam.
[46]
Maymounkov, P. and Mazieres, D. 2003. Rateless codes and big downloads. In Proceedings of the International Workshop on Peer-To-Peer Systems.
[47]
Papailiopoulos, D. S., Luo, J., Dimakis, A. G., Huang, C., and Li, J. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of the IEEE INFOCOM Mini-Conference. IEEE, Los Alamitos, CA.
[48]
Plank, J. S. 1997. A tutorial on reed-solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exper.
[49]
Plank, J. S. 2008. The RAID-6 liberation codes. In Proceedings of the USENIX Conference on File and Storage Technologies.
[50]
Plank, J. S. and Xu, L. 2006. Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In Proceedings of the IEEE International Symposium on Network Computing and Applications.
[51]
Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Soc. Industrial Appl. Math.
[52]
Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., and Kubiatowicz, J. 2003. Pond: The oceanstore prototype. In Proceedings of the USENIX Conference on File and Storage Technologies.
[53]
Rowstron, A. and Druschel, P. 2001. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York.
[54]
Saito, Y., Frolund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems.
[55]
Schrijver, A. 2003. Combinatorial optimization, polyhedra and efficiency. Alg. Combinatorics.
[56]
Schroeder, B. and Gibson, G. A. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies.
[57]
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. 2010. The hadoop distributed file system. In Proceedings of the IEEE Symposium on Massive Storage Systems and Technologies. IEEE, Los Alamitos, CA.
[58]
Suh, C. and Ramchandran, K. 2011. Exact regeneration codes for distributed storage repair using interference alignment. IEEE Trans. Inf. Theory.
[59]
Tanner, R. M. 1981. A recursive approach to low complexity codes. IEEE Trans. Inf. Theory.
[60]
Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., and Bohra, A. 2010. Hydrafs: A high-throughput file system for the hydrastor content-addressable storage system. In Proceedings of the USENIX Conference on File and Storage Technologies.
[61]
Wang, Z., Tamo, I., and Bruck, J. 2011. On codes for optimal rebuilding access. Tech. rep. ETR111, Caltech.
[62]
Weatherspoon, H. and Kubiatowics, J. 2001. Erasure coding vs. replication: A quantitative comparison. In Proceedings of the International Workshop on Peer-To-Peer Systems.
[63]
Welch, B., Unangst, M., Abbasi, Z., Gibson, G., Mueller, B., Small, J., Zelenka, J., and Zhou, B. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies.
[64]
Wildani, A., Schwarz, T. J. E., Miller, E. L., and Long, D. E. 2009. Protecting against rare event failures in archival systems. In Proceedings of the IEEE International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

Cited By

View all
  • (2024)FPGA-Accelerated Erasure Coding Encoding in Ceph Based on an Efficient Layered StrategyElectronics10.3390/electronics1303059313:3(593)Online publication date: 31-Jan-2024
  • (2024)BFT-DSN: A Byzantine Fault-Tolerant Decentralized Storage NetworkIEEE Transactions on Computers10.1109/TC.2024.336595373:5(1300-1312)Online publication date: 14-Feb-2024
  • (2024)Repairing Schemes for Tamo-Barg Codes2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619492(2140-2145)Online publication date: 7-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 9, Issue 1
March 2013
84 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2435204
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2013
Accepted: 01 September 2012
Revised: 01 August 2012
Received: 01 October 2011
Published in TOS Volume 9, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Storage
  2. erasure codes
  3. fault tolerance
  4. reconstruction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)9
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FPGA-Accelerated Erasure Coding Encoding in Ceph Based on an Efficient Layered StrategyElectronics10.3390/electronics1303059313:3(593)Online publication date: 31-Jan-2024
  • (2024)BFT-DSN: A Byzantine Fault-Tolerant Decentralized Storage NetworkIEEE Transactions on Computers10.1109/TC.2024.336595373:5(1300-1312)Online publication date: 14-Feb-2024
  • (2024)Repairing Schemes for Tamo-Barg Codes2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619492(2140-2145)Online publication date: 7-Jul-2024
  • (2024)A New Framework for Designing Polynomial Codes for Private Information Retrieval2024 60th Annual Allerton Conference on Communication, Control, and Computing10.1109/Allerton63246.2024.10735266(1-6)Online publication date: 24-Sep-2024
  • (2024)Watermarking for Social Networks Images With Improved Robustness Through Polar CodesIEEE Access10.1109/ACCESS.2024.344648912(118154-118168)Online publication date: 2024
  • (2024)Reliability through an optimal SDS controller’s placement in a SDDC and smart cityCluster Computing10.1007/s10586-024-04325-627:6(7219-7240)Online publication date: 1-Sep-2024
  • (2023)Reaching Consensus in the Byzantine Empire: A Comprehensive Review of BFT Consensus AlgorithmsACM Computing Surveys10.1145/363655356:5(1-41)Online publication date: 9-Dec-2023
  • (2023)Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)ACM Transactions on Storage10.1145/362619819:4(1-26)Online publication date: 14-Nov-2023
  • (2023)A Bound on the Minimal Field Size of LRCs, and Cyclic MR Codes That Attain ItIEEE Transactions on Information Theory10.1109/TIT.2022.322595369:4(2240-2260)Online publication date: 1-Apr-2023
  • (2023)A Generalization of Array Codes With Local Properties and Efficient Encoding/DecodingIEEE Transactions on Information Theory10.1109/TIT.2022.320214069:1(107-125)Online publication date: Jan-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media