More Web Proxy on the site http://driver.im/

Article

Profile-guided I/O partitioning

Authors:

David KaeliAuthors Info & Claims

ICS '03: Proceedings of the 17th annual international conference on Supercomputing

Pages 252 - 260

https://doi.org/10.1145/782814.782850

Published: 23 June 2003 Publication History

Abstract

In the field of high performance computing there is a growing need to process large, complex datasets. Many of these applications are file-intensive workloads, performing a large number of reads from and writes to a small number of files. When executing these workloads on cluster-based systems, performance cannot scale by simply increasing the number of compute nodes. To effectively exploit parallel resources we need to parallelize file I/O. The potential impact of exploiting parallel I/O grows as the gap between CPU and disk speeds continues to increase.While parallel I/O middleware systems (e.g., MPI I/O) provide users with environments where large datasets can be shared among multiple distributed processes, the performance of file-intensive applications depends heavily on how the data is accessed and where the data is physically located on disk. I/O operations need to be parallelized both at the application level (using middleware) and at the disk level (using partitioning).In this paper, we present a new profile-guided greedy partitioning algorithm to parallelize I/O access for file-intensive applications run on cluster-based systems. We are using MPI and MPI I/O to provide parallelization at the application level. We utilize I/O profiling to capture relevant information about the I/O stream. We then use these profiles to guide file partitioning across multiple disks to significantly improve I/O throughput.

References

[1]

R. Bordawekar, A. Choudhary, and R. Thakur. Data Access Reorganizations in Compiling Out-of-Core Parallel Programs on Distributed Memory Machines. Technical report, September 1994.]]

[2]

P. Brezany, A. Choudhary, and M. Dang. Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 343--350, 1998.]]

[3]

A. D. Brown, T. Mowry, and O. Krieger. Compiler-Based I/O Prefetching for Out-of-Core Applications. ACM Transactions on Computer Systems, 19(2):111--170, 2001.]]

Digital Library

[4]

P. Chen and E. Lee. Striping in a RAID Level 5 Disk Array. In Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pages 136--145, Ottawa, Canada, 15--19 1995.]]

Digital Library

[5]

T. Chilimbi, M. Hill, and J. Larus. Cache-Conscious Structure Layout. In SIGPLAN Conference on Programming Language Design and Implementation, pages 1--12, 1999.]]

Digital Library

[6]

G. Cooperman and X. Ma. Overcoming the Memory Wall in Symbolic Algebra: A Faster Permutation Algorithm. In SIGSAM Bulletin, 2003.]]

Digital Library

[7]

D. Kaeli, L. Fong, D. Renfrew, K. Imming and R. Booth. Performance of a CC-NUMA Prototype. IBM Journal of Research and Development, 41(3):205--214, 1997.]]

Digital Library

[8]

D. Kotz. Disk-directed i/o for an out-of-core computation. Technical report, 1995.]]

Digital Library

[9]

E. Smini and D.A. Reed. Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications. Performance Evaluation, 3:27--44, 1998.]]

Digital Library

[10]

D. Genius and S. Lelait. Improving Data Layout through Coloring-directed Array Merging. Technical Report iratr-1999-3, Universität Karlsruhe, 1999.]]

[11]

H. Simitci and D.A. Reed. A Comparison of Logical and Physical Parallel I/O Patterns. International Journal of High Performance Computing Applications, 12(3):364--380, 1998.]]

Digital Library

[12]

M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam. A Unified Titling Approach for Out-of-Core Computation. Technical report, 1996.]]

[13]

M. Kandemir, A. Choudhary, J. Ramanujam, and R. Bordawekar. Optimizing Out-of-Core Computations in Uniprocessors. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, pages 1--10, 1997.]]

[14]

M. Ashouei, D. Jiang, W. Meleis, D. Kaeli, M. El-Shenawee, E. Mizan, Y. Wang, C. Rappaport and C. DiMarzio. Profile-based characterization and tuning for subsurface sensing and imaging applications. International Journal of Systems, Science and Technology, pages 40--55, Sep 2002.]]

[15]

T. Madhyastha and D. Reed. Learning to classify parallel input/output access patterns. IEEE Transactions On Parallel And Distributed Systems, 13(8), 2002.]]

Digital Library

[16]

K. S. McKinley, S. Carr, and C.-W. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996.]]

Digital Library

[17]

G. Memik, M. Kandemir, and A. Choudhary. Design and Evaluation of a Compiler-directed Collective I/O technique. In Proceedings of 6th Annual EuroPar Conference, pages 1263--1272, Aug-Sept 2000.]]

Digital Library

[18]

K. Moor. I/O Performance Enhancements of Out-of-Core Applications. Notre Dame University, Department of Computer Science and Engineering.]]

[19]

MPICH - A Portable Implementation of MPI. URL: www-unix.mcs.anl.gov/mpi/mpich.]]

[20]

N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best. File-Access Characteristics of Parallel Scientific Workloads. IEEE Transactions on Parallel and Distributed Systems, 7(10):1075--1089, 1996.]]

Digital Library

[21]

D. Patterson, G. Gibson, and R. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116, 1995.]]

Digital Library

[22]

R. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed Prefetching and Caching. In Proceedings of the 15th ACM Symposium on Operating System Principles, pages 79--95, Dec 1995.]]

Digital Library

[23]

R. Bagrodia, A. Chien, Y. Hsu and D. Reed. Input/output: Instrumentation, characterization, modeling and management policy. Technical report, CalTech Concurrent Supercomputing Facilities, CalTech, 1994.]]

[24]

A. N. Reddy and P. Bannerjee. A Study of I/O Behavior of Perfect Benchmarks on a Multicomputer. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 312--321, 1990.]]

Digital Library

[25]

R. Thakur, R. Bordawekar, and A. Choudhary. Compiler and Runtime Support for Out-of-Core HPF Programs. In Proceedings of the 8th ACM International Conference on Supercomputing, pages 382--391, Manchester, UK, 1994. ACM Press.]]

Digital Library

[26]

R. Thakur, W. Gropp, and E. Lusk. Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation. Mathematics and Computer Science Division, Argonne National Laboratory, Oct. 1997. ANL/MCS-TM-234.]]

[27]

R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In Proceedings of the 7th Symposium on Frontiers of Massively Parallel Computation, February 1999.]]

Digital Library

[28]

R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pages 23--32, 1999.]]

Digital Library

Cited By

Navasca CMaas MManiatis PLim HXu GBlackburn SPetrank E(2023)Predicting Dynamic Properties of Heap Allocations using Neural Networks Trained on Static Code: An Intellectual AbstractProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595275(43-57)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595275
Park SBhowmik MUta AWeissman JChandra AGavrilovska ATiwari D(2022)DAOSProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531466(4-15)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531466
Welton BMiller BAyguadé EHwu WBadia RHofstee H(2020)Identifying and (automatically) remedying performance problems in CPU/GPU applicationsProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392759(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392759
Show More Cited By

Index Terms

Profile-guided I/O partitioning
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Hardware
  1. Integrated circuits
    1. Interconnect

Recommendations

Design and Evaluation of MPI File Domain Partitioning Methods under Extent-Based File Locking Protocol

MPI collective I/O has been an effective method for parallel shared-file access and maintaining the canonical orders of structured data in files. Its implementation commonly uses a two-phase I/O strategy that partitions a file into disjoint file domains,...
Large files, small writes, and pNFS
ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Workload characterization studies highlight the prevalence of small and sequential data requests in scientific applications. Parallel file systems excel at large data transfers but sometimes at the expense of small I/O performance. pNFS is an NFSv4.1 ...
MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '03: Proceedings of the 17th annual international conference on Supercomputing

June 2003

380 pages

ISBN:1581137338

DOI:10.1145/782814

General Chair:
Utpal Banerjee
Intel Corporation
,
Program Chairs:
Kyle A. Gallivan
Florida State University
,
Antonio Gonzalez
Intel Labs & Univ. Politècnica de Catalunya

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICS03

Sponsor:

ICS03: International Conference on Supercomputing 2003

June 23 - 26, 2003

CA, San Francisco, USA

Acceptance Rates

ICS '03 Paper Acceptance Rate 36 of 171 submissions, 21%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Navasca CMaas MManiatis PLim HXu GBlackburn SPetrank E(2023)Predicting Dynamic Properties of Heap Allocations using Neural Networks Trained on Static Code: An Intellectual AbstractProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595275(43-57)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595275
Park SBhowmik MUta AWeissman JChandra AGavrilovska ATiwari D(2022)DAOSProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531466(4-15)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531466
Welton BMiller BAyguadé EHwu WBadia RHofstee H(2020)Identifying and (automatically) remedying performance problems in CPU/GPU applicationsProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392759(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392759
He SLi ZZhou JYin YXu XChen YSun X(2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2948901(1-1)Online publication date: 2020
https://doi.org/10.1109/TPDS.2019.2948901
Wei BXiao LZhou BQin GYan BHuo Z(2020)Fine-grained management of I/O optimizations based on workload characteristicsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9344-115:3Online publication date: 31-Dec-2020
https://dl.acm.org/doi/10.1007/s11704-020-9344-1
He SSun X(2018)A Cost-Effective Distribution-Aware Data Replication Scheme for Parallel I/O SystemsIEEE Transactions on Computers10.1109/TC.2018.283168967:10(1374-1387)Online publication date: 1-Oct-2018
https://doi.org/10.1109/TC.2018.2831689
He SSun XWang YXu C(2018)A Migratory Heterogeneity-Aware Data Layout Scheme for Parallel File Systems2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00122(1133-1142)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00122
Ji XWang CEl-Sayed NMa XKim YVazhkudai SXue WSanchez DMohr BRaghavan P(2017)Understanding object-level memory access patterns across the spectrumProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126917(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126917
Shi XLi MLiu WJin HYu CChen YGropp WBeckman PLi ZCazorla F(2017)SSDUPProceedings of the International Conference on Supercomputing10.1145/3079079.3079087(1-10)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079087
He SWang YLi ZSun XXu C(2017)Cost-Aware Region-Level Data Placement in Multi-Tiered Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263683728:7(1853-1865)Online publication date: 10-Jun-2017
https://dl.acm.org/doi/10.1109/TPDS.2016.2636837
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents