[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

Published: 01 October 2008 Publication History

Abstract

Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.

References

[1]
Caglar, Benson, Huang, and Chu. Usfmpi: A multi-threaded implementation of mpi for linux clusters. In Proc's of the 15th Inter. Conf. on Paral. and Dist. Comp. and Sys., pages 92--103, 2003.
[2]
Carns, Ligon, Ross, and Thakur. Pvfs: A parallel file system for linux clusters. In Proc's of the 4th Annual Linux Showcase and Conf., pages 317--327, 2000.
[3]
Caron, Desprez, and Suter. Overlapping computations and communications with i/o in wavefront algorithms. Technical Report RR-5410, Institut National de Recherche en Informatique et en Automatique (INRIA), 2004.
[4]
Choudhary, Bordawekar, More, and Sivaram. Passion runtime library for the intel paragon. In Proc's of the Intel Supercomputer User's Group Conf., pages 119--128, 1995.
[5]
Danalis, Kim, Pollock, and Swany. Transformations to parallel codes for communication-computation overlap. In SC '05: Proc's of the 2005 ACM/IEEE conf. on Supercomputing, page 58, 2005.
[6]
Dickens and Thakur. Improving collective i/o performance using threads. 13th Inter. and 10th Symp. on Paral. and Dist. Processing, pages 38--45, 1999.
[7]
Dickens and Thakur. Evaluation of collective i/o implementations on parallel architectures. J. Paral. Dist. Comp., 61(8):1052--1076, 2001.
[8]
Fernandez, Frachtenberg, and Petrini. Bcs-mpi: A new approach in the system software design for large-scale parallel computers. In SC'03, page 57, 2003.
[9]
Foster, Kesselman, and Tuecke. The nexus task-parallel runtime system. In Proc's of the 1st Inter. Workshop on Paral. Proc., 1994.
[10]
Gropp and Thakur. Issues in developing a thread-safe mpi implementation. In Proc's of the 13th European PVM/MPI Users' Group Meeting, volume 4192, pages 12--21, 2006.
[11]
Hoefler, Squyres, Rehm, and Lumsdaine. A case for non-blocking collective operations. In Book Frontiers of High Perf. Comp. and Networking ISPA Workshops, volume 4331, pages 155--164, 2006.
[12]
Hsu and Smith. The performance impact of i/o optimizations and disk improvements. IBM J. Res. Dev., 48(2):255--289, 2004.
[13]
Kandemir and Choudhary. Compiler-directed i/o optimization. In IPDPS '02: Proc's of the 16th Inter. Symp. on Paral. and Dist. Proc., page 19.2, 2002.
[14]
Kandemir, Choudhary, and Ramanujam. An i/o-conscious tiling strategy for disk-resident data sets. J. Super., 21(3):257--284, 2002.
[15]
Kotz. Disk-directed i/o for an out-of-core computation. In HPDC '95: Proc's of the 4th IEEE Inter. Symp. on High Perf. Dist. Comp., page 159, 1995.
[16]
Krempel. Tracing the connections between mpi-io calls and their corresponding pvfs2 disk operations. Bachelor's thesis, Ruprecht-Karls Universitt Heidelberg, 2006.
[17]
Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface. 1997.
[18]
More, Choudhary, Foster, and Xu. Mtio - a multi-threaded parallel i/o system. In IPPS '97, pages 368--373, 1997.
[19]
Patrick, Son, and Kandemir. Enhancing the performance of mpi-io applications by overlapping i/o, computation and communication. In PPoPP '08, 2008.
[20]
Ross, Thakur, and Choudhary. Achievements and challenges for i/o in computational science. J. of Physics: Conf. Series, 16:501--509, 2005.
[21]
Sancho, Barker, Kerbyson, and Davis. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In SC '06, page 125, 2006.
[22]
Seamons, Chen, Jones, Jozwiak, and Winslett. Server-directed collective i/o in panda. In Supercomputing '95, page 57, 1995.
[23]
Sur, Jin, Chai, and Panda. Rdma read based rendezvous protocol for mpi over infiniband: design alternatives and benefits. In PPoPP '06: Proc's of the 11th ACM SIGPLAN symp. on Principles and practice of paral. prog., pages 32--39, 2006.
[24]
Thakur, Gropp, and Lusk. Data sieving and collective i/o in romio. In FRONTIERS '99, page 182, 1999.
[25]
Thakur, Gropp, and Lusk. On implementing mpi-io portably and with high performance. In Proc's of the 6th workshop on I/O in paral. and dist. sys., pages 23--32, 1999.
[26]
Thakur, Gropp, and Lusk. Optimizing noncontiguous accesses in mpi-io. Paral. Comp., 28(1):83--105, 2002.
[27]
Thakur, Lusk, and Gropp. I/o in parallel applications: The weakest link. The Inter. J. of High Perf. Comp. Appls., 12(4):389--395, 1998.
[28]
Thakur, Lusk, and Gropp. Users guide for romio: A high-performance, portable mpi-io implementation, 2002.
[29]
Tsujita. Effective nonblocking mpi-i/o in remote i/o operations using a multithreaded mechanism. Technical report, 2004

Cited By

View all
  • (2024)I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00044(426-439)Online publication date: 24-Sep-2024
  • (2023)Evaluating Asynchronous Parallel I/O on HPC Systems2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00030(211-221)Online publication date: May-2023
  • (2022)Transparent Asynchronous Parallel I/O Using Background ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309032233:4(891-902)Online publication date: 1-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 42, Issue 6
October 2008
111 pages
ISSN:0163-5980
DOI:10.1145/1453775
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2008
Published in SIGOPS Volume 42, Issue 6

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00044(426-439)Online publication date: 24-Sep-2024
  • (2023)Evaluating Asynchronous Parallel I/O on HPC Systems2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00030(211-221)Online publication date: May-2023
  • (2022)Transparent Asynchronous Parallel I/O Using Background ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309032233:4(891-902)Online publication date: 1-Apr-2022
  • (2020)Smart Streaming: A High-Throughput Fault-tolerant Online Processing System2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00075(396-405)Online publication date: May-2020
  • (2019)Enabling Transparent Asynchronous I/O using Background Threads2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW49588.2019.00006(11-19)Online publication date: Nov-2019
  • (2017)Supporting Fault-Tolerance in Presence of In-Situ AnalyticsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.5555/3101112.3101155(304-313)Online publication date: 14-May-2017
  • (2016)DamarisACM Transactions on Parallel Computing10.1145/29873713:3(1-43)Online publication date: 25-Oct-2016
  • (2014)A fast input/output library for high-resolution climate modelsGeoscientific Model Development10.5194/gmd-7-93-20147:1(93-103)Online publication date: 14-Jan-2014
  • (2013)A fast input/output library for high resolution climate modelsGeoscientific Model Development Discussions10.5194/gmdd-6-4775-20136:3(4775-4807)Online publication date: 13-Sep-2013
  • (2013)CFIOProceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2013.111(911-918)Online publication date: 16-Jul-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media