[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3545008.3545038acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance

Published: 13 January 2023 Publication History

Abstract

Modern clustered storage systems increasingly adopt erasure coding to realize reliable data storage at low storage redundancy. Locally Repairable Codes (LRC) are a family of practical erasure codes with high repair efficiency. Among various LRC constructions, Optimal-LRC is a recently proposed LRC approach that achieves the optimal Minimum Hamming Distance with low theoretical repair costs. In this paper, we consider the repair performance of Optimal-LRC in clustered storage systems. We show that the conventional flat data placement and random data placement incur substantial cross-cluster repair traffic, which impairs the repair performance. To this end, we design an optimal data placement scheme that provably minimizes the cross-cluster repair traffic, by carefully placing each group of blocks in Optimal-LRC into a minimum number of clusters subject to single-cluster fault tolerance. We implement our optimal data placement scheme on a key-value store prototype atop Memcached, and show via LAN testbed experiments that the optimal data placement significantly improves the repair performance compared to the conventional data placements.

References

[1]
Faraz Ahmad, Srimat T. Chakradhar, Anand Raghunathan, and T. N. Vijaykumar. 2014. ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters. In Proc. of USENIX ATC.
[2]
Han Cai, Ying Miao, Moshe Schwartz, and Xiaohu Tang. 2020. On optimal Locally Repairable Codes with Super-Linear length. IEEE Trans. on Information Theory 66, 8 (2020), 4853–4868.
[3]
Haibo Chen, Heng Zhang, Mingkai Dong, Zhaoguo Wang, Yubin Xia, Haibing Guan, and Binyu Zang. 2017. Efficient and available in-memory kv-store with hybrid erasure coding and replication. ACM Trans. on Storage (TOS) 13, 3 (2017), 25.
[4]
Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. 2013. Leveraging endpoint flexibility in data-intensive clusters. In Proc. of ACM SIGCOMM.
[5]
cisco. 2015. Cisco Systems. Oversubscription and density best practices.https://www.cisco.com/c/en/us/solutions/collateral/data-centervirtualization/storage-networkingsolution/net_implementation_white_paper0900aecd800f592f.html.
[6]
A. Fikes.2010. Storage architecture and challenges.https://cloud.google.com/files/storage_architecture_and_challenges.pdf.
[7]
Daniel Ford, François Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proc. of USENIX OSDI.
[8]
Parikshit Gopalan, Cheng Huang, Huseyin Simitci, and Sergey Yekhanin. 2012. On the locality of codeword symbols. IEEE Trans. on Information Theory 58, 10 (2012), 6925–6934.
[9]
Hanxu Hou, Patrick P. C. Lee, Kenneth W. Shum, and Yuchong Hu. 2019. Rack-aware regenerating codes for data centers. IEEE Trans. on Information Theory 65, 8 (2019), 4730–4745.
[10]
Yuchong Hu, Liangfeng Cheng, Qiaori Yao, Patrick P. C. Lee, Weichun Wang, and Wei Chen. 2021. Exploiting combined locality for Wide-Stripe Erasure Coding in distributed storage. In Proc. of USENIX FAST.
[11]
Yuchong Hu, Xiaolu Li, Mi Zhang, Patrick P. C. Lee, Xiaoyang Zhang, Pan Zhou, and Dan Feng. 2017. Optimal repair layering for erasure-coded data centers: from theory to practice. ACM Trans. on Storage 13, 4 (2017), 24.
[12]
Cheng Huang, Minghua Chen, and Jin Li. 2013. Pyramid Codes: flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. on Storage 9, 1 (2013), 1–28.
[13]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in Windows Azure Storage. In Proc. of USENIX ATC.
[14]
Bert Hubert, Jacco Geul, and Simon Séhier. 2017. The wonder shaper 1.4. https://github.com/magnific0/wondershaper.
[15]
Oleg Kolosov, Alexander Barg, Itzhak Tamo, and Gala Yadgar. 2018. Optimal LRC codes for all lenghts n ≤ q. CoRR abs/1802.00157(2018).
[16]
Oleg Kolosov, Gala Yadgar, Matan Liram, Itzhak Tamo, and Alexander Barg. 2020. On fault tolerance, locality, and optimality in Locally Repairable Codes. ACM Trans. on Storage 16, 18 (2020), 865–877.
[17]
Xiaolu Li, Zuoru Yang, Jinhong Li, Runhui Li, Patrick P. C. Lee, Qun Huang, and Yuchong Hu. 2021. Repair pipelining for erasure-coded storage: algorithms and evaluation.IEEE Trans. on Storage 17, 2 (2021), 29.
[18]
libmemcached. 2014. Libmemcached. https://libmemcached.org/libMemcached.html.
[19]
Jian Liu, Sihem Mesnager, and Lusheng Chen. 2018. New constructions of optimal Locally Recoverable Codes via good polynomials. IEEE Trans. on Information Theory 64, 2 (2018), 889–899.
[20]
M.G. Luby, M. Mitzenmacher, M.A. Shokrollahi, and D.A. Spielman. 2001. Efficient erasure correcting codes. IEEE Trans. on Information Theory 47, 2 (2001), 569–584.
[21]
Mehrtash Mehrabi, Mostafa Shahabinejad, Masoud Ardakani, and Majid Khabbazian. 2018. Improving the update complexity of Locally Repairable Codes. IEEE Trans. on Communications 66, 9 (2018), 3711–3720.
[22]
Memcached. 2021. Memcached. https://memcached.org.
[23]
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. F4: Facebook’s Warm BLOB Storage System. In Proc. of USENIX OSDI.
[24]
James S Plank, Jianqiang Luo, Catherine D Schuman, Lihao Xu, Zooko Wilcox-O’Hearn, 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proc. of USENIX FAST.
[25]
James S Plank, Scott Simmerman, and Catherine D Schuman. 2008. Jerasure: A library in C/C++ facilitating erasure coding for storage applications-version 1.2. Technical Report. University of Tennessee, Tech. Rep. CS-08-627.
[26]
N. Prakash, Vitaly Abdrashitov, and Muriel Médard. 2018. The storage versus repair-bandwidth trade-off for clustered storage systems. IEEE Trans. on Information Theory 64, 8 (2018), 5783–5805.
[27]
KV Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. 2016. EC-Cache: Load-Balanced, low-latency cluster caching with online erasure coding. In Proc. of USENIX OSDI.
[28]
K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2013. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proc. of USENIX HotStorage.
[29]
Ankit Singh Rawat, Dimitris S. Papailiopoulos, Alexandros G. Dimakis, and Sriram Vishwanath. 2016. Locality and availability in distributed storage. IEEE Trans. on Information Theory 62, 8 (2016), 4481–4493.
[30]
Irving Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Indust. Appl. Math. 8, 2 (1960), 300–304.
[31]
Sharad S. Sane. 2013. Pigeonhole principle. Hindustan Book Agency, 169–192.
[32]
Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing Elephants: novel Erasure Codes for big data. In Proc. of VLDB Endowment.
[33]
Zhirong Shen, Patrick P. C. Lee, Jiwu Shu, and Wenzhong Guo. 2020. Cross-rack-aware single failure recovery for clustered file systems. IEEE Trans. on Dependable and Secure Computing 17, 2 (2020), 248–261.
[34]
Natalia Silberstein, Ankit Singh Rawat, O. Ozan Koyluoglu, and Sriram Vishwanath. 2013. Optimal Locally Repairable Codes via rank-metric codes. In Proc. of IEEE ISIT.
[35]
Itzhak Tamo and Alexander Barg. 2014. A family of Optimal Locally Recoverable Codes. IEEE Trans. on Information Theory 60, 8 (2014), 4661–4676.
[36]
Itzhak Tamo, Dimitris S. Papailiopoulos, and Alexandros G. Dimakis. 2016. Optimal Locally Repairable Codes and connections to Matroid Theory. IEEE Trans. on Information Theory 62, 12 (2016), 6661–6671.
[37]
Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese. 2015. Global analytics in the face of bandwidth and regulatory constraints. In Proc. of USENIX NSDI.
[38]
Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In Proc. of IPTPS.
[39]
Shuzhan Wei, Yongkun Li, Yinlong Xu, and Si Wu. 2017. DSC: Dynamic stripe construction for synchronous encoding in clustered file system. In Proc. of IEEE INFOCOM.
[40]
Si Wu, Qingpeng Du, Patrick P. C. Lee, Yongkun Li, and Yinlong Xu. 2022. Optimal data placement for stripe merging in Locally Repairable Codes. In Proc. of IEEE INFOCOM.
[41]
Si Wu, Zhirong Shen, and Patrick PC Lee. 2020. On the optimal repair-scaling trade-off in Locally Repairable Codes. In Proc. of IEEE INFOCOM.
[42]
Si Wu, Zhirong Shen, Patrick P. C. Lee, and Yinlong Xu. 2022. Optimal repair-scaling trade-off in Locally Repairable Codes: analysis and evaluation. IEEE Trans. on Parallel and Distributed Systems 33, 1 (2022), 56–69.
[43]
Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A Pease. 2015. A tale of two erasure codes in HDFS. In Proc. of USENIX FAST.
[44]
Qiaori Yao, Yuchong Hu, Liangfeng Cheng, Patrick PC Lee, Dan Feng, Weichun Wang, and Wei Chen. 2021. StripeMerge: Efficient wide-stripe generation for large-scale erasure-coded storage. In Proc. of IEEE ICDCS.

Cited By

View all
  • (2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
  • (2024)Towards Energy-Efficient and Thermal-Aware Data Placement for Storage ClustersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33516849:4(631-647)Online publication date: Jul-2024
  • (2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustered storage
  2. Data repair
  3. Erasure coding
  4. Locally Repairable Codes

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
  • (2024)Towards Energy-Efficient and Thermal-Aware Data Placement for Storage ClustersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33516849:4(631-647)Online publication date: Jul-2024
  • (2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media