[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1654059.1654077acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Scalable massively parallel I/O to task-local files

Published: 14 November 2009 Publication History

Abstract

Parallel applications often store data in multiple task-local files, for example, to remember checkpoints, to circumvent memory limitations, or to record performance data. When operating at very large processor configurations, such applications often experience scalability limitations when the simultaneous creation of thousands of files causes metadataserver contention or simply when large file counts complicate file management or operations on those files even destabilize the file system. SIONlib is a parallel I/O library that addresses this problem by transparently mapping a large number of task-local files onto a small number of physical files via internal metadata handling and block alignment to ensure high performance. While requiring only minimal source code changes, SIONlib significantly reduces file creation overhead and simplifies file handling without penalizing read and write performance. We evaluate SIONlib's efficiency with up to 288 K tasks and report significant performance improvements in two application scenarios.

References

[1]
Advanced Simulation and Computing Program. The ASC SMG2000 benchmark code, 2001. https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/smg/.
[2]
M. J. Brim and B. P. Miller. Group file operations for scalable tools and middleware. Technical Report No. TR1638, Computer Sciences Department, University of Wisconsin, 2008.
[3]
S. Fadden. An introduction to GPFS version 3.2.1, November 2008. IBM Corporation.
[4]
R. Fagin, J. Nievergelt, N. Pippenger, and H. R. Strong. Extendible hashing - a fast access method for dynamic files. ACM Transactions on Database Systems (TODS), 4(3):315--344, 1979.
[5]
J. Gailly and M. Adler. zlib general-purpose compression library, version 1.2.3. http://www.zlib.net, 2005.
[6]
M. Geimer, F. Wolf, B. J. N. Wylie, and B. Mohr. A scalable tool architecture for diagnosing wait states in massively-parallel applications. Parallel Computing, 35(7):375--388, 2009.
[7]
HDF5. http://www.hdfgroup.org/HDF5/.
[8]
IBM. General Parallel File System. http://www-03.ibm.com/systems/clusters/software/gpfs/index.html.
[9]
IOR Parallel File System Benchmark. http://sourceforge.net/projects/ior-sio/.
[10]
C. Jin, S. Klasky, S. Hodson, J. Lofstead, F. Zheng, M. Wolf, and R. Ross. ADIOS User's Manual. Oak Ridge National Laboratory, November 2008.
[11]
Jülich Supercomputing Centre. JUGENE. http://www.fz-juelich.de/jsc/jugene.
[12]
W. Liao and A. Choudhary. Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In Proc. of the ACM/IEEE SC08 Conference, Austin, TX, November 2008.
[13]
J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki, and C. Jin. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In Proc. of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (CLADE), pages 15--24, Boston, MA, USA, 2008.
[14]
J. M. May. Parallel I/O for High Performance Computing. Morgan Kaufmann Publishers, 2001.
[15]
S. Microsystems. Lustre file system. http:www.lustre.org.
[16]
S. Microsystems. Lustre file system -- high performance storage architecture and scalable cluster file system (white paper), October 2008. http://www.sun.com/software/products/lustre/docs/lustrefilesystem_wp.pdf.
[17]
MPI Forum. MPI: A message passing interface standard, version 2.1. Chapter 13, September 2008.
[18]
NetCFD. http://www.unidata.ucar.edu/software/netcdf/.
[19]
Oak Ridge National Laboratory. Jaguar. http://www.nccs.gov/computing-resources/jaguar/.
[20]
Parallel Virtual File System. http://www.pvfs.org/.
[21]
Scalasca. http://www.scalasca.org/.
[22]
F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, pages 231--244, Berkeley, CA, USA, 2002. USENIX Association.
[23]
SIONlib. http://www.fz-juelich.de/jsc/sionlib/.
[24]
G. Sutmann, R. G. Winkler, and G. Gompper. Multi-particle collision dynamics coupled to molecular dynamics on massively parallel computers. (in preparation).

Cited By

View all
  • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024
  • (2024)Mobilizing underutilized storage nodes via job path: A job-aware file striping approachParallel Computing10.1016/j.parco.2024.103095(103095)Online publication date: Aug-2024
  • (2022)Assessment of the I/O and Storage Subsystem in Modular Supercomputing Architectures2022 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER51413.2022.00077(589-596)Online publication date: Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '09
Sponsor:

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024
  • (2024)Mobilizing underutilized storage nodes via job path: A job-aware file striping approachParallel Computing10.1016/j.parco.2024.103095(103095)Online publication date: Aug-2024
  • (2022)Assessment of the I/O and Storage Subsystem in Modular Supercomputing Architectures2022 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER51413.2022.00077(589-596)Online publication date: Sep-2022
  • (2021)SimurghProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476180(1-14)Online publication date: 14-Nov-2021
  • (2021)Integration of parallel I/O library and flash native acceleratorsProceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3439839.3458736(1-8)Online publication date: 26-Apr-2021
  • (2021)Best practice of IO workload management in containerized environments on supercomputersPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465561(1-7)Online publication date: 17-Jul-2021
  • (2020)OOOPS: An Innovative Tool for IO Workload Management on Supercomputers2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS51040.2020.00069(486-493)Online publication date: Dec-2020
  • (2020)Design and Study of Elastic Recovery in HPC Applications2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00040(261-270)Online publication date: Dec-2020
  • (2020)Lessons Learned from Optimizing the Sunway Storage System for Higher Application I/O PerformanceJournal of Computer Science and Technology10.1007/s11390-020-9798-535:1(47-60)Online publication date: 17-Jan-2020
  • (2020)GekkoFS — A Temporary Burst Buffer File System for HPC ApplicationsJournal of Computer Science and Technology10.1007/s11390-020-9797-635:1(72-91)Online publication date: 17-Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media