[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems

Published: 01 April 2014 Publication History

Abstract

In modern parallel storage systems (e.g., cloud storage and data centers), it is important to provide data availability guarantees against disk (or storage node) failures via redundancy coding schemes. One coding scheme is X-code, which is double-fault tolerant while achieving the optimal update complexity. When a disk/node fails, recovery must be carried out to reduce the possibility of data unavailability. We propose an X-code-based optimal recovery scheme called minimum-disk-read-recovery (MDRR), which minimizes the number of disk reads for single-disk failure recovery. We make several contributions. First, we show that MDRR provides optimal single-disk failure recovery and reduces about 25 percent of disk reads compared to the conventional recovery approach. Second, we prove that any optimal recovery scheme for X-code cannot balance disk reads among different disks within a single stripe in general cases. Third, we propose an efficient logical encoding scheme that issues balanced disk read in a group of stripes for any recovery algorithm (including the MDRR scheme). Finally, we implement our proposed recovery schemes and conduct extensive testbed experiments in a networked storage system prototype. Experiments indicate that MDRR reduces around 20 percent of recovery time of the conventional approach, showing that our theoretical findings are applicable in practice.

Cited By

View all
  • (2024)A Survey of the Past, Present, and Future of Erasure Coding for Storage SystemsACM Transactions on Storage10.1145/3708994Online publication date: 31-Dec-2024
  • (2020)Cross-Rack-Aware Single Failure Recovery for Clustered File SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2017.277429917:2(248-261)Online publication date: 1-Mar-2020
  • (2020)PDL: A Data Layout towards Fast Failure Recovery for Erasure-coded Distributed Storage SystemsIEEE INFOCOM 2020 - IEEE Conference on Computer Communications10.1109/INFOCOM41043.2020.9155350(736-745)Online publication date: 6-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 63, Issue 4
April 2014
264 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2014

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of the Past, Present, and Future of Erasure Coding for Storage SystemsACM Transactions on Storage10.1145/3708994Online publication date: 31-Dec-2024
  • (2020)Cross-Rack-Aware Single Failure Recovery for Clustered File SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2017.277429917:2(248-261)Online publication date: 1-Mar-2020
  • (2020)PDL: A Data Layout towards Fast Failure Recovery for Erasure-coded Distributed Storage SystemsIEEE INFOCOM 2020 - IEEE Conference on Computer Communications10.1109/INFOCOM41043.2020.9155350(736-745)Online publication date: 6-Jul-2020
  • (2018)RAFIProceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference10.5555/3277355.3277403(495-506)Online publication date: 11-Jul-2018
  • (2018)Comparison on Binary MDS Array Codes for Single Disk Failure RecoveryProceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence10.1145/3297156.3297266(536-541)Online publication date: 8-Dec-2018
  • (2017)MDS code constructions with small sub-packetization and near-optimal repair bandwidthProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039823(2109-2122)Online publication date: 16-Jan-2017
  • (2017)Seek-Efficient I/O Optimization in Single Failure Recovery for XOR-Coded Storage SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.259104028:3(877-890)Online publication date: 1-Mar-2017
  • (2017)Triple-fault-tolerant binary MDS array codes with asymptotically optimal repair2017 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT.2017.8006646(839-843)Online publication date: 25-Jun-2017
  • (2017)Performance impacts of hybrid cloud storageComputing10.1007/s00607-017-0560-y99:12(1207-1229)Online publication date: 1-Dec-2017
  • (2016)Cooperative repair of multiple node failures in distributed storage systemsInternational Journal of Information and Coding Theory10.1504/IJICOT.2016.0794953:4(299-323)Online publication date: 1-Jan-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media