More Web Proxy on the site http://driver.im/

research-article

Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

Authors:

Christina M. Patrick,

Mahmut KandemirAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 42, Issue 6

Pages 43 - 49

https://doi.org/10.1145/1453775.1453784

Published: 01 October 2008 Publication History

Abstract

Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.

References

[1]

Caglar, Benson, Huang, and Chu. Usfmpi: A multi-threaded implementation of mpi for linux clusters. In Proc's of the 15th Inter. Conf. on Paral. and Dist. Comp. and Sys., pages 92--103, 2003.

[2]

Carns, Ligon, Ross, and Thakur. Pvfs: A parallel file system for linux clusters. In Proc's of the 4th Annual Linux Showcase and Conf., pages 317--327, 2000.

Digital Library

[3]

Caron, Desprez, and Suter. Overlapping computations and communications with i/o in wavefront algorithms. Technical Report RR-5410, Institut National de Recherche en Informatique et en Automatique (INRIA), 2004.

[4]

Choudhary, Bordawekar, More, and Sivaram. Passion runtime library for the intel paragon. In Proc's of the Intel Supercomputer User's Group Conf., pages 119--128, 1995.

[5]

Danalis, Kim, Pollock, and Swany. Transformations to parallel codes for communication-computation overlap. In SC '05: Proc's of the 2005 ACM/IEEE conf. on Supercomputing, page 58, 2005.

Digital Library

[6]

Dickens and Thakur. Improving collective i/o performance using threads. 13th Inter. and 10th Symp. on Paral. and Dist. Processing, pages 38--45, 1999.

Digital Library

[7]

Dickens and Thakur. Evaluation of collective i/o implementations on parallel architectures. J. Paral. Dist. Comp., 61(8):1052--1076, 2001.

Digital Library

[8]

Fernandez, Frachtenberg, and Petrini. Bcs-mpi: A new approach in the system software design for large-scale parallel computers. In SC'03, page 57, 2003.

Digital Library

[9]

Foster, Kesselman, and Tuecke. The nexus task-parallel runtime system. In Proc's of the 1st Inter. Workshop on Paral. Proc., 1994.

[10]

Gropp and Thakur. Issues in developing a thread-safe mpi implementation. In Proc's of the 13th European PVM/MPI Users' Group Meeting, volume 4192, pages 12--21, 2006.

Digital Library

[11]

Hoefler, Squyres, Rehm, and Lumsdaine. A case for non-blocking collective operations. In Book Frontiers of High Perf. Comp. and Networking ISPA Workshops, volume 4331, pages 155--164, 2006.

Digital Library

[12]

Hsu and Smith. The performance impact of i/o optimizations and disk improvements. IBM J. Res. Dev., 48(2):255--289, 2004.

Digital Library

[13]

Kandemir and Choudhary. Compiler-directed i/o optimization. In IPDPS '02: Proc's of the 16th Inter. Symp. on Paral. and Dist. Proc., page 19.2, 2002.

Digital Library

[14]

Kandemir, Choudhary, and Ramanujam. An i/o-conscious tiling strategy for disk-resident data sets. J. Super., 21(3):257--284, 2002.

Digital Library

[15]

Kotz. Disk-directed i/o for an out-of-core computation. In HPDC '95: Proc's of the 4th IEEE Inter. Symp. on High Perf. Dist. Comp., page 159, 1995.

Digital Library

[16]

Krempel. Tracing the connections between mpi-io calls and their corresponding pvfs2 disk operations. Bachelor's thesis, Ruprecht-Karls Universitt Heidelberg, 2006.

[17]

Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface. 1997.

[18]

More, Choudhary, Foster, and Xu. Mtio - a multi-threaded parallel i/o system. In IPPS '97, pages 368--373, 1997.

Digital Library

[19]

Patrick, Son, and Kandemir. Enhancing the performance of mpi-io applications by overlapping i/o, computation and communication. In PPoPP '08, 2008.

Digital Library

[20]

Ross, Thakur, and Choudhary. Achievements and challenges for i/o in computational science. J. of Physics: Conf. Series, 16:501--509, 2005.

[21]

Sancho, Barker, Kerbyson, and Davis. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In SC '06, page 125, 2006.

Digital Library

[22]

Seamons, Chen, Jones, Jozwiak, and Winslett. Server-directed collective i/o in panda. In Supercomputing '95, page 57, 1995.

Digital Library

[23]

Sur, Jin, Chai, and Panda. Rdma read based rendezvous protocol for mpi over infiniband: design alternatives and benefits. In PPoPP '06: Proc's of the 11th ACM SIGPLAN symp. on Principles and practice of paral. prog., pages 32--39, 2006.

Digital Library

[24]

Thakur, Gropp, and Lusk. Data sieving and collective i/o in romio. In FRONTIERS '99, page 182, 1999.

Digital Library

[25]

Thakur, Gropp, and Lusk. On implementing mpi-io portably and with high performance. In Proc's of the 6th workshop on I/O in paral. and dist. sys., pages 23--32, 1999.

Digital Library

[26]

Thakur, Gropp, and Lusk. Optimizing noncontiguous accesses in mpi-io. Paral. Comp., 28(1):83--105, 2002.

Digital Library

[27]

Thakur, Lusk, and Gropp. I/o in parallel applications: The weakest link. The Inter. J. of High Perf. Comp. Appls., 12(4):389--395, 1998.

Digital Library

[28]

Thakur, Lusk, and Gropp. Users guide for romio: A high-performance, portable mpi-io implementation, 2002.

[29]

Tsujita. Effective nonblocking mpi-i/o in remote i/o operations using a multithreaded mechanism. Technical report, 2004

Cited By

Tarraf AMuñoz JSingh DÖzden TCarretero JWolf F(2024)I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00044(426-439)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00044
Ravi JByna SKoziol QTang HBecchi M(2023)Evaluating Asynchronous Parallel I/O on HPC Systems2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00030(211-221)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00030
Tang HKoziol QRavi JByna S(2022)Transparent Asynchronous Parallel I/O Using Background ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309032233:4(891-902)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TPDS.2021.3090322
Show More Cited By

Index Terms

Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

Recommendations

Effective communication and computation overlap with hybrid MPI/SMPSs
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication ...
Effective communication and computation overlap with hybrid MPI/SMPSs
PPoPP '10

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication ...
Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: We are concerned with the parallelization of finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 42, Issue 6

October 2008

111 pages

ISSN:0163-5980

DOI:10.1145/1453775

Issue’s Table of Contents

Copyright © 2008 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2008

Published in SIGOPS Volume 42, Issue 6

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
237
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tarraf AMuñoz JSingh DÖzden TCarretero JWolf F(2024)I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00044(426-439)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00044
Ravi JByna SKoziol QTang HBecchi M(2023)Evaluating Asynchronous Parallel I/O on HPC Systems2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00030(211-221)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00030
Tang HKoziol QRavi JByna S(2022)Transparent Asynchronous Parallel I/O Using Background ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309032233:4(891-902)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TPDS.2021.3090322
Guo JAgrawal G(2020)Smart Streaming: A High-Throughput Fault-tolerant Online Processing System2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00075(396-405)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00075
Tang HKoziol QByna SMainzer JLi T(2019)Enabling Transparent Asynchronous I/O using Background Threads2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW49588.2019.00006(11-19)Online publication date: Nov-2019
https://doi.org/10.1109/PDSW49588.2019.00006
Liu JAgrawal G(2017)Supporting Fault-Tolerance in Presence of In-Situ AnalyticsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.5555/3101112.3101155(304-313)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.5555/3101112.3101155
Dorier MAntoniu GCappello FSnir MSisneros RYildiz OIbrahim SPeterka TOrf L(2016)DamarisACM Transactions on Parallel Computing10.1145/29873713:3(1-43)Online publication date: 25-Oct-2016
https://dl.acm.org/doi/10.1145/2987371
Huang XWang WFu HYang GWang BZhang C(2014)A fast input/output library for high-resolution climate modelsGeoscientific Model Development10.5194/gmd-7-93-20147:1(93-103)Online publication date: 14-Jan-2014
https://doi.org/10.5194/gmd-7-93-2014
Huang XWang WFu HYang GWang BZhang C(2013)A fast input/output library for high resolution climate modelsGeoscientific Model Development Discussions10.5194/gmdd-6-4775-20136:3(4775-4807)Online publication date: 13-Sep-2013
https://doi.org/10.5194/gmdd-6-4775-2013
Wang WHuang XFu HHu YXu SYang G(2013)CFIOProceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2013.111(911-918)Online publication date: 16-Jul-2013
https://dl.acm.org/doi/10.1109/TrustCom.2013.111
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents