[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

Published: 15 June 2012 Publication History

Abstract

Constantly increasing hardware parallelism poses more and more challenges to programmers and language designers. One approach to harness the massive parallelism is to move to task-based programming models that rely on runtime systems for dependency analysis and scheduling. Such models generally benefit from the existence of a global address space. This paper presents the parallel memory allocator of the Myrmics runtime system, in which multiple allocator instances organized in a tree hierarchy cooperate to implement a global address space with dynamic region support on distributed memory machines. The Myrmics hierarchical memory allocator is step towards improved productivity and performance in parallel programming. Productivity is improved through the use of dynamic regions in a global address space, which provide a convenient shared memory abstraction for dynamic and irregular data structures. Performance is improved through scaling on manycore systems without system-wide cache coherency. We evaluate the stand-alone allocator on an MPI-based x86 cluster and find that it scales well for up to 512 worker cores, while it can outperform Unified Parallel C by a factor of 3.7-10.7x.

References

[1]
C. Arens. The Bowyer-Watson Algorithm; An efficient Implementation in a Database Environment. Technical report, Delft University of Technology, January 2002.
[2]
E. Ayguadé X. Teruel, P. Unnikrishnan, and G. Zhang. The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems, 20(3): 404--418, 2009.
[3]
E. D. Berger, B. G. Zorn, and K. S. McKinley. Reconsidering Custom Memory Allocation. In OOPSLA '02: Proc. 2002 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pages 1--12.
[4]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. SIGPLAN Not., 35:117--128, November 2000.
[5]
R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP '95: Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 207--216.
[6]
J. Bonwick. The Slab Allocator: An Object-Caching Kernel Memory Allocator. In USTC '94: Proc. 1994 USENIX Summer Technical Conference, pages 87--98.
[7]
B. L. Chamberlain, D. Callahan, and H. P. Zima. Paralle Programmability and the Chapel Language. IJHPCA, 21(3):291--312, 2007.
[8]
P. Charles, C. Grothoff, V. A. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In OOPSLA '05: Proc. 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 519--538.
[9]
J. Dubinski. A Parallel Tree Code. New Astronomy, 1(2):133--147, 1996.
[10]
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications v1.1.1. October 2003.
[11]
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In SC '06: Proc. 2006 ACM/IEEE Conference on High Performance Networking and Computing.
[12]
M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In PLDI '98: Proc. 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223.
[13]
D. Gay and A. Aiken. Language Support for Regions. In PLDI '01: Proc. 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 70--80.
[14]
D. E. Gay. Memory Management with Explicit Regions. PhD thesis, UC Berkeley, Berkeley, CA, USA, 2001.
[15]
D. Grove, O. Tardieu, D. Cunningham, B. Herta, I. Peshansky, and V. Saraswat. A Performance Model for X10 Applications. In X10 '11: Proc. ACM SIGPLAN 2011 X10 Workshop.
[16]
D. R. Hanson. Fast Allocation and Deallocation of Memory Based on Object Lifetimes. Software Practice and Experience, 20:5--12, January 1990.
[17]
P. N. Hilfinger, D. O. Bonachea, K. Datta, D. Gay, S. L. Graham, B. R. Liblit, G. Pike, J. Z. Su, and K. A. Yelick. Titanium Language Reference Manual, Version 2.19. Technical Report UCB/EECS-2005-15, EECS Berkeley, November 2005.
[18]
J. Howard, S. Dighe, Y. Hoskote, S. R. Vangal, and D. Finan. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In ISSCC '10: Proc. 2010 IEEE International Solid-State Circuits Conference, pages 108--109.
[19]
R. L. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. C. Hertzberg. McRT-Malloc: A Scalable Transactional Memory Allocator. In ISMM '06: Proc. 2006 International Symposium on Memory Management, pages 74--83.
[20]
P. Husbands, C. Iancu, and K. Yelick. A Performance Analysis of the Berkeley UPC Compiler. In ICS '03: Proc. 17th International Conference on Supercomputing, pages 63--73.
[21]
M. S. Johnstone and P. R. Wilson. The Memory Fragmentation Problem: Solved? SIGPLAN Notices, 34:26--36, October 1998.
[22]
S. Kahan and P. Konecny. MAMA!: A Memory Allocator for Multithreaded Architectures. In PPoPP '06: Proc. 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 178--186.
[23]
A. Kukanov and M. Voss. The Foundations for Scalable Multi- Core Software in Intel Threading Building Blocks. Intel Technology Journal, 11(4), Nov. 2007.
[24]
E. A. Lee. The Problem with Threads. Computer, 39(5):33--42, May 2006.
[25]
L. Linardakis. Decoupling Method for Parallel Delaunay Two- Dimensional Mesh Generation. PhD thesis, College of William & Mary, Williamsburg, VA, USA, 2007.
[26]
M. M. Michael. Scalable Lock-Free Dynamic Memory Allocation. SIGPLAN Notices, 39:35--46, June 2004.
[27]
R. W. Numrich and J. Reid. Co-Array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17:1--31, August 1998.
[28]
OpenMP ARB. OpenMP Application Program Interface, v. 3.1. www.openmp.org, July 2011.
[29]
P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos. A Programming Model for Deterministic Task Parallelism. In MSPC '11: Proc. 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness, pages 7--12.
[30]
M. Tofte and J.-P. Talpin. Region-Based Memory Management. Information and Computation, 132(2):109--176, 1997.

Cited By

View all
  • (2016)A bounded memory allocator for software-defined global address spacesACM SIGPLAN Notices10.1145/3241624.292670951:11(78-88)Online publication date: 14-Jun-2016
  • (2016)An interval constrained memory allocator for the Givy GAS runtimeACM SIGPLAN Notices10.1145/3016078.285119551:8(1-2)Online publication date: 27-Feb-2016
  • (2016)A bounded memory allocator for software-defined global address spacesProceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management10.1145/2926697.2926709(78-88)Online publication date: 14-Jun-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 47, Issue 11
ISMM '12
November 2012
136 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2426642
Issue’s Table of Contents
  • cover image ACM Conferences
    ISMM '12: Proceedings of the 2012 international symposium on Memory Management
    June 2012
    152 pages
    ISBN:9781450313506
    DOI:10.1145/2258996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2012
Published in SIGPLAN Volume 47, Issue 11

Check for updates

Author Tags

  1. gas
  2. parallel memory allocator

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)A bounded memory allocator for software-defined global address spacesACM SIGPLAN Notices10.1145/3241624.292670951:11(78-88)Online publication date: 14-Jun-2016
  • (2016)An interval constrained memory allocator for the Givy GAS runtimeACM SIGPLAN Notices10.1145/3016078.285119551:8(1-2)Online publication date: 27-Feb-2016
  • (2016)A bounded memory allocator for software-defined global address spacesProceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management10.1145/2926697.2926709(78-88)Online publication date: 14-Jun-2016
  • (2016)An interval constrained memory allocator for the Givy GAS runtimeProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851195(1-2)Online publication date: 27-Feb-2016
  • (2016)Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory ManagementundefinedOnline publication date: 14-Jun-2016
  • (2023)Performance of the Vipera Framework for DSLs on Micro-Core ArchitecturesEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_5(66-79)Online publication date: 2-May-2023
  • (2016)On the detection of custom memory allocators in C binariesEmpirical Software Engineering10.1007/s10664-015-9362-z21:3(753-777)Online publication date: 1-Jun-2016
  • (2015)Acceleration of MPI Mechanisms for Sustainable HPC ApplicationsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502022:2(28-45)Online publication date: 6-Apr-2015
  • (2014)Distributed region-based memory allocation and synchronizationThe International Journal of High Performance Computing Applications10.1177/109434201455286328:4(406-414)Online publication date: 7-Nov-2014
  • (2014)Extending the OpenSHMEM Memory Model to Support User-Defined SpacesProceedings of the 8th International Conference on Partitioned Global Address Space Programming Models10.1145/2676870.2676884(1-10)Online publication date: 6-Oct-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media