[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/782814.782850acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Profile-guided I/O partitioning

Published: 23 June 2003 Publication History

Abstract

In the field of high performance computing there is a growing need to process large, complex datasets. Many of these applications are file-intensive workloads, performing a large number of reads from and writes to a small number of files. When executing these workloads on cluster-based systems, performance cannot scale by simply increasing the number of compute nodes. To effectively exploit parallel resources we need to parallelize file I/O. The potential impact of exploiting parallel I/O grows as the gap between CPU and disk speeds continues to increase.While parallel I/O middleware systems (e.g., MPI I/O) provide users with environments where large datasets can be shared among multiple distributed processes, the performance of file-intensive applications depends heavily on how the data is accessed and where the data is physically located on disk. I/O operations need to be parallelized both at the application level (using middleware) and at the disk level (using partitioning).In this paper, we present a new profile-guided greedy partitioning algorithm to parallelize I/O access for file-intensive applications run on cluster-based systems. We are using MPI and MPI I/O to provide parallelization at the application level. We utilize I/O profiling to capture relevant information about the I/O stream. We then use these profiles to guide file partitioning across multiple disks to significantly improve I/O throughput.

References

[1]
R. Bordawekar, A. Choudhary, and R. Thakur. Data Access Reorganizations in Compiling Out-of-Core Parallel Programs on Distributed Memory Machines. Technical report, September 1994.]]
[2]
P. Brezany, A. Choudhary, and M. Dang. Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 343--350, 1998.]]
[3]
A. D. Brown, T. Mowry, and O. Krieger. Compiler-Based I/O Prefetching for Out-of-Core Applications. ACM Transactions on Computer Systems, 19(2):111--170, 2001.]]
[4]
P. Chen and E. Lee. Striping in a RAID Level 5 Disk Array. In Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pages 136--145, Ottawa, Canada, 15--19 1995.]]
[5]
T. Chilimbi, M. Hill, and J. Larus. Cache-Conscious Structure Layout. In SIGPLAN Conference on Programming Language Design and Implementation, pages 1--12, 1999.]]
[6]
G. Cooperman and X. Ma. Overcoming the Memory Wall in Symbolic Algebra: A Faster Permutation Algorithm. In SIGSAM Bulletin, 2003.]]
[7]
D. Kaeli, L. Fong, D. Renfrew, K. Imming and R. Booth. Performance of a CC-NUMA Prototype. IBM Journal of Research and Development, 41(3):205--214, 1997.]]
[8]
D. Kotz. Disk-directed i/o for an out-of-core computation. Technical report, 1995.]]
[9]
E. Smini and D.A. Reed. Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications. Performance Evaluation, 3:27--44, 1998.]]
[10]
D. Genius and S. Lelait. Improving Data Layout through Coloring-directed Array Merging. Technical Report iratr-1999-3, Universität Karlsruhe, 1999.]]
[11]
H. Simitci and D.A. Reed. A Comparison of Logical and Physical Parallel I/O Patterns. International Journal of High Performance Computing Applications, 12(3):364--380, 1998.]]
[12]
M. Kandemir, R. Bordawekar, A. Choudhary, and J. Ramanujam. A Unified Titling Approach for Out-of-Core Computation. Technical report, 1996.]]
[13]
M. Kandemir, A. Choudhary, J. Ramanujam, and R. Bordawekar. Optimizing Out-of-Core Computations in Uniprocessors. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, pages 1--10, 1997.]]
[14]
M. Ashouei, D. Jiang, W. Meleis, D. Kaeli, M. El-Shenawee, E. Mizan, Y. Wang, C. Rappaport and C. DiMarzio. Profile-based characterization and tuning for subsurface sensing and imaging applications. International Journal of Systems, Science and Technology, pages 40--55, Sep 2002.]]
[15]
T. Madhyastha and D. Reed. Learning to classify parallel input/output access patterns. IEEE Transactions On Parallel And Distributed Systems, 13(8), 2002.]]
[16]
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996.]]
[17]
G. Memik, M. Kandemir, and A. Choudhary. Design and Evaluation of a Compiler-directed Collective I/O technique. In Proceedings of 6th Annual EuroPar Conference, pages 1263--1272, Aug-Sept 2000.]]
[18]
K. Moor. I/O Performance Enhancements of Out-of-Core Applications. Notre Dame University, Department of Computer Science and Engineering.]]
[19]
MPICH - A Portable Implementation of MPI. URL: www-unix.mcs.anl.gov/mpi/mpich.]]
[20]
N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best. File-Access Characteristics of Parallel Scientific Workloads. IEEE Transactions on Parallel and Distributed Systems, 7(10):1075--1089, 1996.]]
[21]
D. Patterson, G. Gibson, and R. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pages 109--116, 1995.]]
[22]
R. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed Prefetching and Caching. In Proceedings of the 15th ACM Symposium on Operating System Principles, pages 79--95, Dec 1995.]]
[23]
R. Bagrodia, A. Chien, Y. Hsu and D. Reed. Input/output: Instrumentation, characterization, modeling and management policy. Technical report, CalTech Concurrent Supercomputing Facilities, CalTech, 1994.]]
[24]
A. N. Reddy and P. Bannerjee. A Study of I/O Behavior of Perfect Benchmarks on a Multicomputer. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 312--321, 1990.]]
[25]
R. Thakur, R. Bordawekar, and A. Choudhary. Compiler and Runtime Support for Out-of-Core HPF Programs. In Proceedings of the 8th ACM International Conference on Supercomputing, pages 382--391, Manchester, UK, 1994. ACM Press.]]
[26]
R. Thakur, W. Gropp, and E. Lusk. Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation. Mathematics and Computer Science Division, Argonne National Laboratory, Oct. 1997. ANL/MCS-TM-234.]]
[27]
R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In Proceedings of the 7th Symposium on Frontiers of Massively Parallel Computation, February 1999.]]
[28]
R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, pages 23--32, 1999.]]

Cited By

View all
  • (2023)Predicting Dynamic Properties of Heap Allocations using Neural Networks Trained on Static Code: An Intellectual AbstractProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595275(43-57)Online publication date: 6-Jun-2023
  • (2022)DAOSProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531466(4-15)Online publication date: 27-Jun-2022
  • (2020)Identifying and (automatically) remedying performance problems in CPU/GPU applicationsProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392759(1-13)Online publication date: 29-Jun-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '03: Proceedings of the 17th annual international conference on Supercomputing
June 2003
380 pages
ISBN:1581137338
DOI:10.1145/782814
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clusters
  2. parallel I/O
  3. profile-guided I/O

Qualifiers

  • Article

Conference

ICS03
Sponsor:
ICS03: International Conference on Supercomputing 2003
June 23 - 26, 2003
CA, San Francisco, USA

Acceptance Rates

ICS '03 Paper Acceptance Rate 36 of 171 submissions, 21%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Predicting Dynamic Properties of Heap Allocations using Neural Networks Trained on Static Code: An Intellectual AbstractProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595275(43-57)Online publication date: 6-Jun-2023
  • (2022)DAOSProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531466(4-15)Online publication date: 27-Jun-2022
  • (2020)Identifying and (automatically) remedying performance problems in CPU/GPU applicationsProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392759(1-13)Online publication date: 29-Jun-2020
  • (2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2948901(1-1)Online publication date: 2020
  • (2020)Fine-grained management of I/O optimizations based on workload characteristicsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9344-115:3Online publication date: 31-Dec-2020
  • (2018)A Cost-Effective Distribution-Aware Data Replication Scheme for Parallel I/O SystemsIEEE Transactions on Computers10.1109/TC.2018.283168967:10(1374-1387)Online publication date: 1-Oct-2018
  • (2018)A Migratory Heterogeneity-Aware Data Layout Scheme for Parallel File Systems2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00122(1133-1142)Online publication date: May-2018
  • (2017)Understanding object-level memory access patterns across the spectrumProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126917(1-12)Online publication date: 12-Nov-2017
  • (2017)SSDUPProceedings of the International Conference on Supercomputing10.1145/3079079.3079087(1-10)Online publication date: 14-Jun-2017
  • (2017)Cost-Aware Region-Level Data Placement in Multi-Tiered Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263683728:7(1853-1865)Online publication date: 10-Jun-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media