[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581784.3607041acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Fine-grained Policy-driven I/O Sharing for Burst Buffers

Published: 11 November 2023 Publication History

Abstract

A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5--13.7% higher I/O throughput and 19.5--40.4% lower performance variation than existing algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1--99.8% caused by I/O interference.

Supplemental Material

MP4 File - SC23 paper presentation recording for "Fine-grained Policy-driven I/O Sharing for Burst Buffers"
SC23 paper presentation recording for "Fine-grained Policy-driven I/O Sharing for Burst Buffers", by Ed Karrels, Lei Huang, Yuhong Kan, Ishank Arora, Yinzhi Wang, Daniel S. Katz, William Gropp and Zhao Zhang

References

[1]
Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Heng Ma, Hyungro Lee, et al. 2020. AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. BioRxiv (2020).
[2]
Tom Charnock and Adam Moss. 2016. Deep Recurrent Neural Networks for Supernovae Classification. arXiv preprint arXiv:1606.07442 (2016).
[3]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009). 248--255.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[5]
Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In SC'99: International Conference for High Performance Computing, Networking, Storage and Analysis, Vol. 99. 5--33.
[6]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision. 2961--2969.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[8]
Dave Henseler, Benjamin Landsteiner, Doug Petesch, Cornell Wright, and Nicholas J Wright. 2016. Architecture and design of Cray DataWarp. In Cray User Group meeting.
[9]
Stephen Herbein, Dong H Ahn, Don Lipari, Thomas RW Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, and Michela Taufer. 2016. Scalable I/O-aware job scheduling for burst buffer enabled HPC clusters. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. ACM, 69--80.
[10]
Galen Hunt and Doug Brubacher. 1999. Detours: Binary interception of Win32 functions. In 3rd USENIX Windows NT Symposium.
[11]
Kamil Iskra, John W Romein, Kazutomo Yoshii, and Pete Beckman. 2008. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 153--162.
[12]
Xu Ji, Bin Yang, Tianyu Zhang, Xiaosong Ma, Xiupeng Zhu, Xiyang Wang, Nosayba El-Sayed, Jidong Zhai, Weiguo Liu, and Wei Xue. 2019. Automatic, application-aware I/O forwarding resource allocation. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 265--279.
[13]
Julian Kates-Harbeck, Alexey Svyatkovskiy, and William Tang. 2019. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 568, 7753 (2019), 526--531.
[14]
Michael Kerrisk and P Zijlstra. 2014. Linux Programmer's Manual. The Linux man-pages project 3 (2014).
[15]
Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, and Xian-He Sun. 2016. Leveraging burst buffer coordination to prevent I/O interference. In 2016 IEEE 12th International Conference on e-Science (e-Science). IEEE, 371--380.
[16]
Weihao Liang, Yong Chen, Jialin Liu, and Hong An. 2019. CARS: A contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput. 87 (2019), 25--34.
[17]
Jay F Lofstead, Scott Klasky, Karsten Schwan, Norbert Podhorszki, and Chen Jin. 2008. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In 6th International workshop on Challenges of Large Applications in Distributed Environments (CLADE). ACM, 15--24.
[18]
Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D Carothers, Abhinav Bhatele, and Kwan-Liu Ma. 2017. Quantifying I/O and communication traffic interference on dragonfly networks equipped with burst buffers. In IEEE International Conference on Cluster Computing. IEEE, 204--215.
[19]
Tirthak Patel, Rohan Garg, and Devesh Tiwari. 2020. GIFT: A coupon based throttle-and-reward mechanism for fair and efficient i/o bandwidth management on parallel storage systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 103--119.
[20]
J Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, and Zhao Zhang. 2021. KAISA: an adaptive second-order optimizer framework for deep neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.
[21]
J Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, and Ian T Foster. 2020. Convolutional Neural Network Training with Distributed K-FAC. International Conference for High Performance Computing, Networking, Storage and Analysis (2020).
[22]
James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kale, and Klaus Schulten. 2005. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 16 (2005), 1781--1802.
[23]
Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.
[24]
Pavel Shamis, Manjunath Gorentla Venkata, M Graham Lopez, Matthew B Baker, Oscar Hernandez, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard L Graham, Liran Liss, et al. 2015. UCX: an open source framework for HPC network APIs and beyond. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. IEEE, 40--43.
[25]
William C Skamarock, Joseph B Klemp, and Jimy Dudhia. 2001. Prototypes for the WRF (Weather Research and Forecasting) model. In Preprints, Ninth Conf. Mesoscale Processes. Amer. Meteorol. Soc., J11--J15.
[26]
Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. On implementing MPI-IO portably and with high performance. In Proceedings of 6th Workshop on I/O in Parallel and Distributed Systems. 23--32.
[27]
Rajeev Thakur, Ewing Lusk, and William Gropp. 1997. Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Technical Report. Argonne National Laboratory.
[28]
Sagar Thapaliya, Purushotham Bangalore, Jat Lofstead, Kathryn Mohror, and Adam Moody. 2016. Managing I/O interference in a shared burst buffer system. In 45th International Conference on Parallel Processing (ICPP). IEEE, 416--425.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2023
1428 pages
ISBN:9798400701092
DOI:10.1145/3581784
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

Conference

SC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 483
    Total Downloads
  • Downloads (Last 12 months)367
  • Downloads (Last 6 weeks)34
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media