[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3659914.3659926acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article
Open access

Reducing the Impact of I/O Contention in Numerical Weather Prediction Workflows at Scale Using DAOS

Published: 03 June 2024 Publication History

Abstract

Operational Numerical Weather Prediction (NWP) workflows are highly data-intensive. Data volumes have increased by many orders of magnitude over the last 40 years, and are expected to continue to do so, especially given the upcoming adoption of Machine Learning in forecast processes. Parallel POSIX-compliant file systems have been the dominant paradigm in data storage and exchange in HPC workflows for many years. This paper presents ECMWF's move beyond the POSIX paradigm, implementing a backend for their storage library to support DAOS --- a novel high-performance object store designed for massively distributed Non-Volatile Memory. This system is demonstrated to be able to outperform the highly mature and optimised POSIX backend when used under high load and contention, as per typical forecast workflow I/O patterns. This work constitutes a significant step forward, beyond the performance constraints imposed by POSIX semantics.

References

[1]
"About Our Forecasts", 2024. https://www.ecmwf.int/en/forecasts/documentation-and-support
[2]
P. Bauer, A. Thorpe, and G. Brunet, "The Quiet Revolution of Numerical Weather Prediction". Nature 525, 47--55 (2015).
[3]
"Lustre Best Practices", 2024. https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html
[4]
"The Open Group Base Specifications Issue 7, 2018 edition", 2024. https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/
[5]
G. Lockwood, "What's so bad about POSIX I/O?". The Next Platform 2017. https://www.nextplatform.com/2017/09/11/whats-bad-posix-io/
[6]
A. K. Paul, O. Faaland, A. Moody, E. Gonsiorowski, K. Mohror, and A. R. Butt, "Understanding HPC Application I/O Behavior Using System Level Statistics", 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), Pune, India, 2020, pp. 202--211
[7]
F. Schmuck, and R. Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters", 2002. https://www.usenix.org/legacy/publications/library/proceedings/fast02/full_-papers/schmuck/schmuck_html/index.html
[8]
A. George, and R. Mohr, "Understanding Lustre Internals", 2024. https://wiki.lustre.org/Understanding_Lustre_Internals
[9]
S. Smart, T. Quintino, and B. Raoult, "A Scalable Object Store for Meteorological and Climate Data". In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC '17). Association for Computing Machinery, New York, NY, USA, Article 13, 1--8.
[10]
"FDB", 2024, GitHub repository. https://github.com/ecmwf/fdb
[11]
N. Manubens, T. Quintino, S. D. Smart, E. Danovaro, and A. Jackson, "DAOS as HPC Storage: a View From Numerical Weather Prediction", 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, 2023, pp. 1029--1040
[12]
Z. Liang, J. Lombardi, M. Chaarawi, and M. Hennecke, "DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory", In: Panda, D. (eds) Supercomputing Frontiers. SCFA 2020. Lecture Notes in Computer Science(), vol 12082. Springer, Cham.
[13]
"NEXTGenIO User Guide and Applications", 2024. https://ngioproject.github.io/nextgenio-docs/html/index.html
[14]
J. Liu, Q. Koziol, G. Butler, N. Fortner, M. Chaarawi, H. Tang, S. Byna, G. Lockwood, R. Cheema, K. Kallback-Rose, D. Hazen, and Mr. Prabhat, "Evaluation of HPC Application I/O on Object Storage Systems", 2018. 24--34.
[15]
J. Lofstead, I. Jimenez, C. Maltzahn, Q. Koziol, J. Bent, and E. Barton, "DAOS and Friends: A Proposal for an Exascale Storage System", SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA, 2016, pp. 585--596
[16]
A. Jackson, and N. Manubens, "DAOS as HPC Storage: Exploring Interfaces", Proceedings of 3rd Workshop on Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads (REX-IO), 2023
[17]
M. Hennecke, "Understanding DAOS Storage Performance Scalability", 2023. In Proceedings of the HPC Asia 2023 Workshops (HPC Asia '23 Workshops). Association for Computing Machinery, New York, NY, USA, 1--14.
[18]
J. Soumagne, J. Henderson, M. Chaarawi, N. Fortner, S. Breitenfeld, S. Lu, D. Robinson, E. Pourmal, and J. Lombardi, "Accelerating hdf5 i/o for exascale using daos", 2021. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 903--914.
[19]
R. S. Venkatesh, G. Eisenhauer, S. Klasky, and A. Gavrilovska, "Enhancing Metadata Transfer Efficiency: Unlocking the Potential of DAOS in the ADIOS context". In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23). Association for Computing Machinery, New York, NY, USA, 1223--1228.
[20]
A. Aghayev, S. Weil, M. Kuchnik, M. Nelson, G. R. Ganger, and G. Amvrosiadis, "File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution". In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). Association for Computing Machinery, New York, NY, USA, 353--369.
[21]
R. R. Chandrasekar, L. Evans, and R. Wespetal, "An Exploration into Object Storage for Exascale Supercomputers". Cray User Group 2017.
[22]
J. Lüttgau, M. Kuhn, K. Duwe, Y. Alforov, E. Betke, J. Kunkel, and T. Ludwig, "Survey of storage systems for high-performance computing", 2018. Supercomputing Frontiers and Innovations. 5. 31--58.
[23]
A. Dilger, D. Hildebrand, J. Kunkel, J. Lofstead, G. Markomanolis, S. Ihara, and H. Nolte, "IO500 10 node list Supercomputing 2023", November 2023. https://io500.org/list/sc23/ten-production
[24]
"HPC IO Benchmark Repository", 2024, GitHub repository. https://github.com/hpc/ior
[25]
N. Manubens, S.D. Smart, T. Quintino, and A. Jackson, "Performance Comparison of DAOS and Lustre for Object Data Storage Approaches", 2022. IEEE/ACM International Parallel Data Systems Workshop (PDSW), 7--12.
[26]
"Access to Archive Datasets", 2024. https://www.ecmwf.int/en/forecasts/access-forecasts/access-archive-datasets
[27]
S. Smart, T. Quintino, and B. Raoult. "A High-Performance Distributed Object-Store for Exascale Numerical Weather Prediction and Climate". In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC '19). Association for Computing Machinery, New York, NY, USA, Article 16, 1--11.
[28]
B. Medjahed, M. Ouzzani, and A. Elmagarmid, "Generalization of ACID Properties", 2009. Cyber Center Publications. Paper 97. http://docs.lib.purdue.edu/ccpubs/97
[29]
"DAOS Foundation", 2024. https://foundation.daos.io
[30]
"DAOS Architecture", 2024. https://docs.daos.io/latest/overview/architecture
[31]
"DAOS File System", 2024. https://docs.daos.io/v2.4/user/filesystem/
[32]
D. Waddington, M. Kunitomi, C. Dickey, S. Rao, A. Abboud, and J. Tran, "Evaluation of intel 3D-xpoint NVDIMM technology for memory-intensive genomic workloads". In Proceedings of the International Symposium on Memory Systems (MEMSYS '19). Association for Computing Machinery, New York, NY, USA, 277--287.
[33]
A. Jackson, "Evaluating the latest Optane memory: A glorious swansong?", 2023. 4th Workshop on Heterogeneous Memory Systems (HMEM 2023), SC23.
[34]
N. Manubens, A. Jackson, T. Quintino, S. Smart and E. Danovaro, "DAOS weather field I/O tests", 2024, GitHub repository ecmwf-projects/daos-tests (0.2.0).
[35]
"Destination Earth", 2024. https://destination-earth.eu
[36]
"Warm World", 2024. https://warmworld.de
[37]
"OpenCUBE", 2024. https://horizon-opencube.eu
[38]
"European Pilot for Exascale", 2024. https://eupex.eu

Index Terms

  1. Reducing the Impact of I/O Contention in Numerical Weather Prediction Workflows at Scale Using DAOS

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference
      June 2024
      296 pages
      ISBN:9798400706394
      DOI:10.1145/3659914
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 June 2024

      Check for updates

      Author Tags

      1. high-performance storage
      2. I/O contention
      3. scalability
      4. object storage
      5. DAOS
      6. lustre
      7. numerical weather prediction

      Qualifiers

      • Research-article

      Funding Sources

      • DE3100

      Conference

      PASC '24
      Sponsor:

      Acceptance Rates

      PASC '24 Paper Acceptance Rate 26 of 36 submissions, 72%;
      Overall Acceptance Rate 109 of 221 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 117
        Total Downloads
      • Downloads (Last 12 months)117
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media