[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3545008.3545029acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters

Published: 13 January 2023 Publication History

Abstract

Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, no special treatment on repair link scheduling, and only targeting specific erasure code constructions.
In this paper, we present CMRepair, an efficient Cross-rack Multi-stripe Repair technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair carefully chooses the nodes for reading/repairing blocks and greedily searches for the near-optimal multi-stripe repair solution that reduces the cross-rack repair time while only introducing negligible computational overhead. Furthermore, it selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. CMRepair can also be extended to tackle full-node repair, multi-failure repair, and adapt to different erasure codes. Experiments show that CMRepair can reduce 6.42%-62.50% of the cross-rack repair time and improve 24.94%-53.91% of the repair throughput.

References

[1]
Faraz Ahmad, Srimat T Chakradhar, Anand Raghunathan, and TN Vijaykumar. 2014. {ShuffleWatcher}: Shuffle-aware Scheduling in Multi-tenant {MapReduce} Clusters. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 1–13.
[2]
Apache. 2020. Apache Hadoop 3.1.4. https://hadoop.apache.org/docs/r3.1.4/.
[3]
Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. 267–280.
[4]
ceph. 2014. Erasure coding in ceph. https://ceph.com/planet/erasure-coding-in-ceph/.
[5]
Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 231–242.
[6]
Colossus. -. successor to google file system. http://static.googleusercontent.com/media/research.google.com/.
[7]
Alexandros G Dimakis, P Brighten Godfrey, Yunnan Wu, Martin J Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE transactions on information theory 56, 9 (2010), 4539–4551.
[8]
eecs. 2014. Jerasure. http://web.eecs.utk.edu/jplank/plank/papers/CS-08-627/Jerasure-1.2.tar.
[9]
Daniel Ford, François Labelle, Florentina I Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10).
[10]
Yingxun Fu, Xun Liu, Jiwu Shu, Zhirong Shen, Shiye Zhang, Jun Wu, Jianyong Duan, and Li Ma. 2020. Device and Placement Aware Framework to optimize Single Failure Recoveries and Reads for Erasure Coded Storage System with Heterogeneous Storage Devices. In 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 225–235.
[11]
Google. 2021. Google Datacenters. Available:http://www.google.com/about/datacenters/.
[12]
Yuchong Hu, Liangfeng Cheng, Qiaori Yao, Patrick PC Lee, Weichun Wang, and Wei Chen. 2021. Exploiting Combined Locality for {Wide-Stripe} Erasure Coding in Distributed Storage. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 233–248.
[13]
Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick PC Lee, Xiaoyang Zhang, Pan Zhou, and Dan Feng. 2017. Optimal repair layering for erasure-coded data centers: From theory to practice. ACM Transactions on Storage (TOS) 13, 4 (2017), 1–24.
[14]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). 15–26.
[15]
Cheng Huang and Lihao Xu. 2008. STAR: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 7 (2008), 889–901.
[16]
Sung-Ju Lee, Puneet Sharma, Sujata Banerjee, Sujoy Basu, and Rodrigo Fonseca. 2005. Measuring bandwidth between planetlab nodes. In International Workshop on Passive and Active Network Measurement. Springer, 292–305.
[17]
Runhui Li, Yuchong Hu, and Patrick PC Lee. 2017. Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Transactions on parallel and distributed systems 28, 9 (2017), 2500–2513.
[18]
Runhui Li, Patrick PC Lee, and Yuchong Hu. 2014. Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 419–430.
[19]
Runhui Li, Xiaolu Li, Patrick PC Lee, and Qun Huang. 2017. Repair Pipelining for {Erasure-Coded} Storage. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). 567–579.
[20]
Cong Lin, Lirong Cui, David W Coit, and Min Lv. 2017. Performance analysis for a wireless sensor network of star topology with random nodes deployment. Wireless Personal Communications 97, 3 (2017), 3993–4013.
[21]
Shiyao Lin, Guowen Gong, Zhirong Shen, Patrick PC Lee, and Jiwu Shu. 2021. Boosting {Full-Node} Repair in {Erasure-Coded} Storage. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 641–655.
[22]
Linux. -. iperf. https://github.com/esnet/iperf.
[23]
Linux man page. -. tc. https://linux.die.net/man/8/tc.
[24]
Subrata Mitra, Rajesh Panta, Moo-Ryong Ra, and Saurabh Bagchi. 2016. Partial-parallel-repair (ppr) a distributed technique for repairing erasure coded storage. In Proceedings of the eleventh European conference on computer systems. 1–16.
[25]
Saad Mubeen and Shashi Kumar. 2010. Designing efficient source routing for mesh topology network on chip platforms. In 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools. IEEE, 181–188.
[26]
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, 2014. f4: Facebook’s warm {BLOB} storage system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 383–398.
[27]
Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proceedings of the VLDB Endowment 6, 11 (2013), 1092–1101.
[28]
Lluis Pamies-Juarez, Filip Blagojevic, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandic. 2016. Opening the Chrysalis: On the Real Repair Performance of {MSR} Codes. In 14th USENIX conference on file and storage technologies (FAST 16). 81–94.
[29]
James S Plank, Mario Blaum, and James L Hafner. 2013. SD codes: erasure codes designed for how storage systems really fail. In FAST. 95–104.
[30]
Korlakai Vinayak Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In 5th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 13).
[31]
Korlakai Vinayak Rashmi, Nihar B Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A” hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM conference on SIGCOMM. 331–342.
[32]
Irving S Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics 8, 2(1960), 300–304.
[33]
Yingdi Shan, Kang Chen, Tuoyu Gong, Lidong Zhou, Tai Zhou, and Yongwei Wu. 2021. Geometric Partitioning: Explore the Boundary of Optimal Erasure Code Repair. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 457–471.
[34]
Zhirong Shen, Jiwu Shu, Zhijie Huang, and Yingxun Fu. 2020. ClusterSR: Cluster-aware scattered repair in erasure-coded storage. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 42–51.
[35]
Zhirong Shen, Jiwu Shu, and Patrick PC Lee. 2016. Reconsidering single failure recovery in clustered file systems. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 323–334.
[36]
Itzhak Tamo and Alexander Barg. 2014. A family of optimal locally recoverable codes. IEEE Transactions on Information Theory 60, 8 (2014), 4661–4676.
[37]
Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, P Vijay Kumar, Alexandar Barg, Min Ye, Srinivasan Narayanamurthy, 2018. Clay codes: Moulding {MDS} codes to yield an {MSR} code. In 16th USENIX Conference on File and Storage Technologies (FAST 18). 139–154.
[38]
Yan Wang, Dongsheng Wei, Xunrui Yin, and Xin Wang. 2014. Heterogeneity-aware data regeneration in distributed storage systems. In IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 1878–1886.
[39]
Windows. -. Amazon EC2. https://aws.amazon.com/.
[40]
Si Wu, Zhirong Shen, and Patrick PC Lee. 2020. On the optimal repair-scaling trade-off in Locally Repairable Codes. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 2155–2164.
[41]
Xin Xie, Chentao Wu, Gen Yang, Zongxin Ye, Xubin He, Jie Li, Minyi Guo, Guangtao Xue, Yuanyuan Dong, and Yafei Zhao. 2020. AZ-Recovery: An Efficient Crossing-AZ Recovery Scheme for Erasure Coded Cloud Storage Systems. In 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 236–245.
[42]
Hai Zhou, Dan Feng, and Yuchong Hu. 2021. Multi-level Forwarding and Scheduling Repair Technique in Heterogeneous Network for Erasure-coded Clusters. In 50th International Conference on Parallel Processing. 1–11.
[43]
Yunfeng Zhu, Patrick PC Lee, Liping Xiang, Yinlong Xu, and Lingling Gao. 2012. A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012). IEEE, 1–12.
[44]
Yunfeng Zhu, Jian Lin, Patrick PC Lee, and Yinlong Xu. 2014. Boosting degraded reads in heterogeneous erasure-coded storage systems. IEEE Trans. Comput. 64, 8 (2014), 2145–2157.

Cited By

View all
  • (2023)Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328218034:8(2251-2264)Online publication date: 1-Aug-2023
  • (2023)MDTUpdate: A Multi-Block Double Tree Update Technique in Heterogeneous Erasure-Coded ClustersIEEE Transactions on Computers10.1109/TC.2023.327106472:10(2808-2821)Online publication date: 1-Oct-2023

Index Terms

  1. Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
    August 2022
    976 pages
    ISBN:9781450397339
    DOI:10.1145/3545008
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Erasure Code
    2. Heterogeneous Network
    3. Multiple Stripes
    4. Rack Architecture
    5. Repair Time

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Key R&D Program of China

    Conference

    ICPP '22
    ICPP '22: 51st International Conference on Parallel Processing
    August 29 - September 1, 2022
    Bordeaux, France

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328218034:8(2251-2264)Online publication date: 1-Aug-2023
    • (2023)MDTUpdate: A Multi-Block Double Tree Update Technique in Heterogeneous Erasure-Coded ClustersIEEE Transactions on Computers10.1109/TC.2023.327106472:10(2808-2821)Online publication date: 1-Oct-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media