Abstract
The divergence between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The recent use of multi-core processor designs has, however, brought new problems to the design of memory architectures - as more cores are added to each successive generation of processor, equivalent improvement in memory capacity and memory sub-systems must be made if the compute components of the processor are to remain sufficiently supplied with data. These issues combined with the traditional problem of designing cache-efficient code help to ensure that memory remains an on-going challenge for application and machine designers.
In this paper we present a comprehensive discussion of WMTools - a trace-based toolkit designed to support the analysis of memory allocation for parallel applications. This paper features an extended discussion of the WMTrace tracing tool presented in previous work including a revised discussion on trace-compression and several refinements to the tracing methodology to reduce overheads and improve tool scalability.
The second half of this paper features a case study in which we apply WMTools to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes. This study demonstrates the use of WMTools in elucidating memory allocation issues in high-performance scientific codes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
memP (2011), http://sourceforge.net/projects/memp/
Valgrind (2011), http://valgrind.org/info/
Budanur, S., Mueller, F., Gamblin, T.: Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs. SIGMETRICS Perform. Eval. Rev. 38, 30–36 (2011)
Burtscher, M.: VPC3: A Fast and Effective Trace-Compression Algorithm. SIGMETRICS Perform. Eval. Rev. 32, 167–176 (2004)
Deutsch, P., Gailly, J.-L.: ZLIB Compressed Data Format Specification (version 3.3). Request for Comments RFC:1950, Internet Engineering Task Force (IETF) (May 1996)
Koop, M., Jones, T., Panda, D.: Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), pp. 495–504 (May 2007)
Koop, M.J., Sur, S., Gao, Q., Panda, D.K.: High Performance MPI Design using Unreliable Datagram for Ultra-scale InfiniBand Clusters. In: Proceedings of the 21st IEEE/ACM International Conference on Supercomputing, ICS 2007, pp. 180–189. ACM, New York (2007)
Liu, J., et al.: Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics. In: Proceedings of the 2003 ACM/IEEE International Conference on Supercomputing, SC 2003, p. 58. ACM, New York (2003)
Luk, C.-K., et al.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Programming Language Design and Implementation, pp. 190–200. ACM Press, New York (2005)
Nethercote, N., Seward, J.: How to Shadow Every Byte of Memory used by a Program. In: Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pp. 65–74. ACM, New York (2007)
Nethercote, N., Seward, J.: Valgrind: a Framework for Heavyweight Dynamic Binary Instrumentation. In: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2007, pp. 89–100. ACM, New York (2007)
Perks, O., Hammond, S.D., Pennycook, S.J., Jarvis, S.A.: WMTrace - A Lightweight Memory Allocation Tracker and Analysis Framework. In: Proceedings of the UK Performance Engineering Workshop (UKPEW 2011) (2011)
Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. SIGARCH Comput. Archit. News 23, 20–24 (1995)
Zorn, B., Hilfinger, P.: A Memory Allocation Profiler for C and Lisp Programs. In: Proceedings of the Summer 1988 USENIX Conference, pp. 223–237 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perks, O., Hammond, S.D., Pennycook, S.J., Jarvis, S.A. (2011). WMTools - Assessing Parallel Application Memory Utilisation at Scale. In: Thomas, N. (eds) Computer Performance Engineering. EPEW 2011. Lecture Notes in Computer Science, vol 6977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24749-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-24749-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24748-4
Online ISBN: 978-3-642-24749-1
eBook Packages: Computer ScienceComputer Science (R0)