[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

WMTools - Assessing Parallel Application Memory Utilisation at Scale

  • Conference paper
Computer Performance Engineering (EPEW 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6977))

Included in the following conference series:

Abstract

The divergence between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The recent use of multi-core processor designs has, however, brought new problems to the design of memory architectures - as more cores are added to each successive generation of processor, equivalent improvement in memory capacity and memory sub-systems must be made if the compute components of the processor are to remain sufficiently supplied with data. These issues combined with the traditional problem of designing cache-efficient code help to ensure that memory remains an on-going challenge for application and machine designers.

In this paper we present a comprehensive discussion of WMTools - a trace-based toolkit designed to support the analysis of memory allocation for parallel applications. This paper features an extended discussion of the WMTrace tracing tool presented in previous work including a revised discussion on trace-compression and several refinements to the tracing methodology to reduce overheads and improve tool scalability.

The second half of this paper features a case study in which we apply WMTools to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes. This study demonstrates the use of WMTools in elucidating memory allocation issues in high-performance scientific codes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. memP (2011), http://sourceforge.net/projects/memp/

  2. Valgrind (2011), http://valgrind.org/info/

  3. Budanur, S., Mueller, F., Gamblin, T.: Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs. SIGMETRICS Perform. Eval. Rev. 38, 30–36 (2011)

    Article  Google Scholar 

  4. Burtscher, M.: VPC3: A Fast and Effective Trace-Compression Algorithm. SIGMETRICS Perform. Eval. Rev. 32, 167–176 (2004)

    Article  Google Scholar 

  5. Deutsch, P., Gailly, J.-L.: ZLIB Compressed Data Format Specification (version 3.3). Request for Comments RFC:1950, Internet Engineering Task Force (IETF) (May 1996)

    Google Scholar 

  6. Koop, M., Jones, T., Panda, D.: Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), pp. 495–504 (May 2007)

    Google Scholar 

  7. Koop, M.J., Sur, S., Gao, Q., Panda, D.K.: High Performance MPI Design using Unreliable Datagram for Ultra-scale InfiniBand Clusters. In: Proceedings of the 21st IEEE/ACM International Conference on Supercomputing, ICS 2007, pp. 180–189. ACM, New York (2007)

    Google Scholar 

  8. Liu, J., et al.: Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics. In: Proceedings of the 2003 ACM/IEEE International Conference on Supercomputing, SC 2003, p. 58. ACM, New York (2003)

    Google Scholar 

  9. Luk, C.-K., et al.: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In: Programming Language Design and Implementation, pp. 190–200. ACM Press, New York (2005)

    Google Scholar 

  10. Nethercote, N., Seward, J.: How to Shadow Every Byte of Memory used by a Program. In: Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE 2007, pp. 65–74. ACM, New York (2007)

    Google Scholar 

  11. Nethercote, N., Seward, J.: Valgrind: a Framework for Heavyweight Dynamic Binary Instrumentation. In: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2007, pp. 89–100. ACM, New York (2007)

    Chapter  Google Scholar 

  12. Perks, O., Hammond, S.D., Pennycook, S.J., Jarvis, S.A.: WMTrace - A Lightweight Memory Allocation Tracker and Analysis Framework. In: Proceedings of the UK Performance Engineering Workshop (UKPEW 2011) (2011)

    Google Scholar 

  13. Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. SIGARCH Comput. Archit. News 23, 20–24 (1995)

    Article  Google Scholar 

  14. Zorn, B., Hilfinger, P.: A Memory Allocation Profiler for C and Lisp Programs. In: Proceedings of the Summer 1988 USENIX Conference, pp. 223–237 (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Perks, O., Hammond, S.D., Pennycook, S.J., Jarvis, S.A. (2011). WMTools - Assessing Parallel Application Memory Utilisation at Scale. In: Thomas, N. (eds) Computer Performance Engineering. EPEW 2011. Lecture Notes in Computer Science, vol 6977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24749-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24749-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24748-4

  • Online ISBN: 978-3-642-24749-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics