[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning

Published: 14 September 2024 Publication History

Abstract

Erasure coding has been demonstrated as a storage-efficient means against failures, yet its tunability remains a challenging issue in data centers, which is prone to induce substantial cross-cluster traffic. In this article, we present ClusterRT, a cluster-aware redundancy transitioning approach that can dynamically tailor the redundancy degree of erasure coding in data centers. ClusterRT formulates the data relocation as the maximum flow problem to reduce cross-cluster data transfers. It then designs a parity-coordinated update algorithm, which gathers the parity chunks within the same cluster and leverages encoding dependency to further decrease the cross-cluster update traffic. ClusterRT finally rotates the parity chunks to balance the cross-cluster transitioning traffic across the data center. Large-scale simulation and Alibaba Cloud ECS experiments show that ClusterRT reduces 94.0% to 96.2% of transitioning traffic and reduces 70.4% to 88.4% of transitioning time.

References

[1]
Ceph. 2016. Erasure Coding in Ceph. Retrieved from https://docs.ceph.com/en/latest/rados/operations/erasure-code/
[2]
OpenStack. 2019. Erasure Code Support. Retrieved from https://docs.openstack.org/swift/latest/overview_erasure_code.html
[4]
Alibaba Cloud. 2023. Alibaba Cloud Elastic Compute Service. Retrieved from https://www.alibabacloud.com/product/ecs
[5]
Faraz Ahmad, Srimat T. Chakradhar, Anand Raghunathan, and T. N. Vijaykumar. 2014. ShuffleWatcher: Shuffle-aware scheduling in multi-tenant MapReduce clusters. In 2014 USENIX Annual Technical Conference (USENIX ATC ’14). 1–13.
[6]
Haibo Chen, Heng Zhang, Mingkai Dong, Zhaoguo Wang, Yubin Xia, Haibing Guan, and Binyu Zang. 2017. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Transactions on Storage 13, 3 (2017), 1–30.
[7]
Liangfeng Cheng, Yuchong Hu, and Patrick P. C. Lee. 2019. Coupling decentralized Key-value stores with erasure coding. In Proceedings of the ACM Symposium on Cloud Computing (SoCC).
[8]
Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 231–242.
[9]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. 143–154.
[10]
Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10).
[11]
Guowen Gong, Zhirong Shen, Suzhen Wu, Xiaolu Li, and Patrick P. C. Lee. 2021. Optimal rack-coordinated updates in erasure-coded data centers. In IEEE Conference on Computer Communications (IEEE INFOCOM ’21). IEEE, 1–10.
[12]
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication. 51–62.
[13]
Yuchong Hu, Liangfeng Cheng, Qiaori Yao, Patrick P. C. Lee, Weichun Wang, and Wei Chen. 2021. Exploiting combined locality for wide-stripe erasure coding in distributed storage. In 19th USENIX Conference on File and Storage Technologies (FAST ’21). 233–248.
[14]
Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick P. C. Lee, Xiaoyang Zhang, Pan Zhou, and Dan Feng. 2017. Optimal repair layering for erasure-coded data centers: From theory to practice. ACM Transactions on Storage (TOS) 13, 4 (2017), 1–24.
[15]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In 2012 USENIX Annual Technical Conference (USENIX ATC’12). 15–26.
[16]
Jianzhong Huang, Xianhai Liang, Xiao Qin, Ping Xie, and Changsheng Xie. 2014. Scale-RS: An efficient scaling scheme for RS-coded storage clusters. IEEE Transactions on Parallel and Distributed Systems 26, 6 (2014), 1704–1717.
[17]
Alon Itai, Yehoshua Perl, and Yossi Shiloach. 1982. The complexity of finding maximum disjoint paths with length constraints. Networks 12, 3 (1982), 277–286.
[18]
Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky. 2008. Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. ACM Transactions on Storage 4, 3 (2008), 1–25.
[19]
Saurabh Kadekodi, Francisco Maturana, Sanjith Athlur, Arif Merchant, K. V. Rashmi, and Gregory R. Ganger. 2022. Tiger: Disk-adaptive redundancy without placement restrictions. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’22). 413–429.
[20]
Saurabh Kadekodi, K. V. Rashmi, and Gregory R. Ganger. 2019. Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In 17th USENIX Conference on File and Storage Technologies (FAST’19). 345–358.
[21]
Jérome Lacan and Jérome Fimes. 2004. Systematic MDS erasure codes based on Vandermonde matrices. IEEE Communications Letters 8, 9 (2004), 570–572.
[22]
Francisco Maturana, V. S. Chaitanya Mukka, and K. V. Rashmi. 2020. Access-optimal linear MDS convertible codes for all parameters. In IEEE International Symposium on Information Theory (ISIT’20), IEEE, 577–582.
[23]
Francisco Maturana and K. V. Rashmi. 2020. Convertible codes: New class of codes for efficient conversion of coded data in distributed storage. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS’20), Vol. 151. 66.
[24]
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebook’s warm BLOB storage system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), USENIX Association, 383–398.
[25]
Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proceedings of the VLDB Endowment 6, 11 (2013), 1092–1101.
[26]
Dimitris S. Papailiopoulos and Alexandros G. Dimakis. 2014. Locally repairable codes. IEEE Transactions on Information Theory 60, 10 (2014), 5843–5855.
[27]
James S. Plank, Scott Simmerman, and Catherine D. Schuman. 2008. Jerasure: A Library in C/C++ Facilitating Erasure Coding for Storage Applications-Version 1.2. University of Tennessee, Tech. Rep. CS-08-627 23 (2008).
[28]
Irving S. Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2 (1960), 300–304.
[29]
Zhirong Shen and Patrick P. C. Lee. 2018. Cross-rack-aware updates in erasure-coded data centers. In Proceedings of the 47th International Conference on Parallel Processing. 1–10.
[30]
Zhirong Shen, Jiwu Shu, Zhijie Huang, and Yingxun Fu. 2020. ClusterSR: Cluster-aware scattered repair in erasure-coded storage. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS’20). IEEE, 42–51.
[31]
Zhirong Shen, Jiwu Shu, and Patrick P. C. Lee. 2016. Reconsidering single failure recovery in clustered file systems. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16), IEEE, 323–334.
[32]
Konstantin Taranov, Gustavo Alonso, and Torsten Hoefler. 2018. Fast and strongly-consistent per-item resilience in key-value stores. In Proceedings of the Thirteenth European Conference on Computer Systems (EuroSys). 1–14.
[33]
Robert Endre Tarjan. 1983. Data Structures and Network Algorithms. SIAM.
[34]
Jiguang Wan, Peng Xu, Xubin He, Jibin Wang, Junyao Li, and Changsheng Xie. 2016. H-scale: A fast approach to scale disk arrays via hybrid stripe deployment. ACM Transactions on Storage 12, 3 (2016), 1–30.
[35]
Zizhong Wang, Haixia Wang, Airan Shao, and Dongsheng Wang. 2020. An adaptive erasure-coded storage scheme with an efficient code-switching algorithm. In Proceedings of the 49th International Conference on Parallel Processing. 1–11.
[36]
Hakim Weatherspoon and John D. Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In International Workshop on Peer-to-peer Systems. 328–337.
[37]
Chentao Wu, Xubin He, Jizhong Han, Huailiang Tan, and Changsheng Xie. 2012. SDM: A stripe-based data migration scheme to improve the scalability of RAID-6. In 2012 IEEE International Conference on Cluster Computing, IEEE, 284–292.
[38]
Si Wu, Qingpeng Du, Patrick P. C. Lee, Yongkun Li, and Yinlong Xu. 2022. Optimal data placement for stripe merging in locally repairable codes. In IEEE Conference on Computer Communications (IEEE INFOCOM’22). IEEE, 1669–1678.
[39]
Si Wu, Zhirong Shen, and Patrick P. C. Lee. 2020. Enabling I/O-efficient redundancy transitioning in erasure-coded KV stores via elastic Reed-Solomon codes. In International Symposium on Reliable Distributed Systems (SRDS’20), IEEE, 246–255.
[40]
Si Wu, Zhirong Shen, and Patrick P. C. Lee. 2020. On the optimal repair-scaling trade-off in locally repairable codes. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, IEEE, 2155–2164.
[41]
Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In 13th USENIX conference on file and storage technologies (FAST’15). 213–226.
[42]
Erci Xu, Mai Zheng, Feng Qin, Yikang Xu, and Jiesheng Wu. 2019. Lessons and actions: What we learned from 10k SSD-Related storage system failures. In 2019 USENIX Annual Technical Conference (USENIX ATC’19). 961–976.
[43]
Qiaori Yao, Yuchong Hu, Liangfeng Cheng, Patrick P. C. Lee, Dan Feng, Weichun Wang, and Wei Chen. 2021. Stripemerge: Efficient wide-stripe generation for large-scale erasure-coded storage. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS’21), IEEE, 483–493.
[44]
Guangyan Zhang, Jiwu Shu, Wei Xue, and Weimin Zheng. 2007. SLAS: An efficient approach to scaling round-robin striped volumes. ACM Transactions on Storage 3, 1 (2007), 3–es.
[45]
Guangyan Zhang, Weiman Zheng, and Jiwu Shu. 2009. ALV: A new data redistribution approach to RAID-5 scaling. IEEE Transactions on Computers 59, 3 (2009), 345–357.
[46]
Xiaoyang Zhang, Yuchong Hu, Patrick P. C. Lee, and Pan Zhou. 2018. Toward optimal storage scaling via network coding: From theory to practice. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, IEEE, 1808–1816.
[47]
Weimin Zheng and Guangyan Zhang. 2011. FastScale: Accelerate RAID scaling by minimizing data migration. In 9th USENIX Conference on File and Storage Technologies (FAST’11).

Index Terms

  1. Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 21, Issue 3
      September 2024
      592 pages
      EISSN:1544-3973
      DOI:10.1145/3613629
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 September 2024
      Online AM: 10 June 2024
      Accepted: 22 May 2024
      Revised: 29 March 2024
      Received: 06 December 2023
      Published in TACO Volume 21, Issue 3

      Check for updates

      Author Tags

      1. Erasure codes
      2. storage system
      3. fault tolerance
      4. redundancy transitioning

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • Major Research Plan of the National Natural Science Foundation of China
      • Natural Science Foundation of China
      • Natural Science Foundation of Fujian Province of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 468
        Total Downloads
      • Downloads (Last 12 months)468
      • Downloads (Last 6 weeks)173
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media