[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Understanding and Improving Computational Science Storage Access through Continuous Characterization

Published: 01 October 2011 Publication History

Abstract

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.
In this study, we outline a methodology for scalable, continuous, systemwide I/O characterization that combines storage device instrumentation, static file system analysis, and a new mechanism for capturing detailed application-level behavior. This methodology allows us to identify both system-wide trends and application-specific I/O strategies. We demonstrate the effectiveness of our methodology by performing a multilevel, two-month study of Intrepid, a 557-teraflop IBM Blue Gene/P system. During that time, we captured application-level I/O characterizations from 6,481 unique jobs spanning 38 science and engineering projects. We used the results of our study to tune example applications, highlight trends that impact the design of future storage systems, and identify opportunities for improvement in I/O characterization methodology.

References

[1]
Agrawal, N., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2008. Towards realistic file-system benchmarks with CodeMRI. SIGMETRICS Perform. Eval. Rev. 36, 2, 52--57.
[2]
Anderson, E. 2009. Capture, conversion, and analysis of an intense NFS workload. In Proccedings of the 7th Conference on File and Storage Technologies (FAST’09). USENIX Association, Berkeley, CA, 139--152.
[3]
Carns, P., Latham, R., Ross, R., Iskra, K., Lang, S., and Riley, K. 2009. 24/7 characterization of petascale I/O workloads. In Proceedings of the Workshop on Interfaces and Architectures for Scientific Data Storage.
[4]
Darshan. 2010. Darshan. http://www.mcs.anl.gov/research/projects/darshan/.
[5]
Dayal, S. 2008. Characterizing HEC storage systems at rest. Tech. rep. CMU-PDL-08-109, Parallel Data Lab, Carnegie Mellon University.
[6]
Ganger, G. R. 1995. Generating representative synthetic workloads: An unsolved problem. In Proceedings of the Computer Measurement Group (CMG) Conference. 1263--1269.
[7]
Godard, S. 2010. SYSSTATutilities homepage. http://pagesperso-orange.fr/sebastien.godard/.
[8]
INCITE. 2010. U.S. Department of Energy INCITE program. http://www.er.doe.gov/ascr/incite/.
[9]
Kim, Y., Gunasekaran, R., Shipman, G., Dillow, D., Zhang, Z., and Settlemyer, B. 2010. Workload characterization of a leadership class storage cluster. In Proceedings of the 5th Petascale Data Storage Workshop (PDSW). 1--5.
[10]
Klundt, R., Weston, M., and Ward, L. 2008. I/O tracing on Catamount. Tech. rep. SAND2008-3684, Sandia National Laboratory.
[11]
Konwinski, A., Bent, J., Nunez, J., and Quist, M. 2007. Towards an I/O tracing framework taxonomy. In Proceedings of the 2nd International Workshop on Petascale Data Storage (PDSW’07). ACM, New York, NY, 56--62.
[12]
Lang, S., Carns, P., Latham, R., Ross, R., Harms, K., and Allcock, W. 2009. I/O performance challenges at leadership scale. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09). ACM, New York, NY, 1--12.
[13]
LANL-Trace. 2010. HPC-5 open source software projects: LANL-Trace. http://institute.lanl.gov/data/software/#lanl-trace.
[14]
Leung, A. W., Pasupathy, S., Goodson, G., and Miller, E. L. 2008. Measurement and analysis of large-scale network file system workloads. In Proceedings of the USENIX Technical Conference. USENIX Association, Berkeley, CA, 213--226.
[15]
Liao, W. and Choudhary, A. 2008. Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Press, Los Alamitos, CA.
[16]
Nieuwejaar, N., Kotz, D., Purakayastha, A., Ellis, C. S., and Best, M. 1996. File-access characteristics of parallel scientific workloads. IEEE Trans. Paral. Distrib. Syst. 7, 10, 1075--1089.
[17]
Noeth, M., Ratn, P., Mueller, F., Schulz, M., and de Supinski, B. R. 2009. Scalatrace: Scalable compression and replay of communication traces for high-performance computing. J. Paral. Distrib. Comput. 69, 696--710.
[18]
Reed, D. A., Aydt, R. A., Noe, R. J., Roth, P. C., Shields, K. A., Schwartz, B. W., and Tavera, L. F. 1993. Scalable performance analysis: The Pablo performance analysis environment. In Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer Society, 104--113.
[19]
Roth, P. C. 2007. Characterizing the I/O behavior of scientific applications on the Cray XT. In Proceedings of the 2nd International Workshop on Petascale Data Storage (PDSW’07). ACM, New York, NY, 50--55.
[20]
Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the FAST Conference on File and Storage Technologies.
[21]
Seelam, S., Chung, I.-H., Hong, D.-Y., Wen, H.-F., and Yu, H. 2008. Early experiences in application level I/O tracing on Blue Gene systems. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium.
[22]
Smirni, E. and Reed, D. 1997. Workload characterization of input/output intensive parallel applications. In Proceedings of the Conference on Modelling Techniques and Tools for Computer Performance Evaluation. Lecture Notes in Computer Science, vol. 1245. Springer-Verlag, 169--180.
[23]
Traeger, A., Zadok, E., Joukov, N., and Wright, C. P. 2008. A nine year study of file system and storage benchmarking. ACM Trans. Stor. 4, 2, 1--56.
[24]
Uselton, A., Hawison, M., Wright, N., Skinner, D., Shalf, J., Oliker, L., Keen, N., and Karavanic, K. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium.
[25]
Vetter, J. S. and McCracken, M. O. 2001. Statistical scalability analysis of communication operations in distributed applications. SIGPLAN Notices 36, 7, 123--132.
[26]
Vijayakumar, K., Mueller, F., Ma, X., and Roth, P. C. 2009. Scalable I/O tracing and analysis. In Proceedings of the 4th Annual Workshop on Petascale Data Storage (PDSW’09). ACM, New York, NY, 26--31.
[27]
Wang, F., Xin, Q., Hong, B., Brandt, S. A., Miller, E. L., Long, D. D. E., and Mclarty, T. T. 2004. File system workload analysis for large scale scientific computing applications. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies. 139--152.
[28]
Wright, N. J., Pfeiffer, W., and Snavely, A. 2009. Characterizing parallel scaling of scientific applications using IPM. In Proceedings of the 10th LCI International Conference on High-Performance Clustered Computing.
[29]
Wu, X., Vijayakumar, K., Mueller, F., Ma, X., and Roth, P. C. 2011. Probabilistic communication and I/O tracing with deterministic replay at scale. In Proceedings of the International Conference on Parallel Processing.
[30]
Yu, H., Sahoo, R. K., Howson, C., Almasi, G., Castanos, J. G., Gupta, M., Moreira, J. E., Parker, J. J., Engelsiepen, T. E., Ross, R., Thakur, R., Latham, R., and Gropp, W. D. 2006. High performance file I/O for the BlueGene/L supercomputer. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture.

Cited By

View all
  • (2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024
  • (2024)TrackIops: Real-Time NFS Performance Metrics ExtractorProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652202(1-8)Online publication date: 22-Apr-2024
  • (2024)Performance Characterization and Provenance of Distributed Task-based Workflows on HPC PlatformsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00254(2032-2039)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. Understanding and Improving Computational Science Storage Access through Continuous Characterization

                      Recommendations

                      Comments

                      Please enable JavaScript to view thecomments powered by Disqus.

                      Information & Contributors

                      Information

                      Published In

                      cover image ACM Transactions on Storage
                      ACM Transactions on Storage  Volume 7, Issue 3
                      October 2011
                      120 pages
                      ISSN:1553-3077
                      EISSN:1553-3093
                      DOI:10.1145/2027066
                      Issue’s Table of Contents
                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      Published: 01 October 2011
                      Accepted: 01 August 2011
                      Received: 01 August 2011
                      Published in TOS Volume 7, Issue 3

                      Permissions

                      Request permissions for this article.

                      Check for updates

                      Author Tags

                      1. I/O characterization
                      2. parallel file systems

                      Qualifiers

                      • Research-article
                      • Research
                      • Refereed

                      Funding Sources

                      Contributors

                      Other Metrics

                      Bibliometrics & Citations

                      Bibliometrics

                      Article Metrics

                      • Downloads (Last 12 months)47
                      • Downloads (Last 6 weeks)8
                      Reflects downloads up to 07 Mar 2025

                      Other Metrics

                      Citations

                      Cited By

                      View all
                      • (2024)ION: Navigating the HPC I/O Optimization Journey using Large Language ModelsProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665950(86-92)Online publication date: 8-Jul-2024
                      • (2024)TrackIops: Real-Time NFS Performance Metrics ExtractorProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652202(1-8)Online publication date: 22-Apr-2024
                      • (2024)Performance Characterization and Provenance of Distributed Task-based Workflows on HPC PlatformsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00254(2032-2039)Online publication date: 17-Nov-2024
                      • (2024)Workload-Adaptive Scheduling for Efficient Use of Parallel File Systems in High-Performance Computing ClustersProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00190(1506-1516)Online publication date: 17-Nov-2024
                      • (2024)High-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00017(93-104)Online publication date: 13-Nov-2024
                      • (2024)ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC62374.2024.00030(221-231)Online publication date: 18-Dec-2024
                      • (2024)DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00038(357-369)Online publication date: 24-Sep-2024
                      • (2024)Access-Based Carving of Data for Efficient Reproducibility of Containers2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00068(557-566)Online publication date: 6-May-2024
                      • (2024)Efficient precision simulation of processes with many-jet final states at the LHCPhysical Review D10.1103/PhysRevD.109.014013109:1Online publication date: 17-Jan-2024
                      • (2024)I/O-signature-based feature analysis and classification of high-performance computing applicationsCluster Computing10.1007/s10586-023-04139-y27:3(3219-3231)Online publication date: 1-Jun-2024
                      • Show More Cited By

                      View Options

                      Login options

                      Full Access

                      View options

                      PDF

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Figures

                      Tables

                      Media

                      Share

                      Share

                      Share this Publication link

                      Share on social media