More Web Proxy on the site http://driver.im/

article

Free access

Hoard: a scalable memory allocator for multithreaded applications

Authors:

Emery D. Berger,

Kathryn S. McKinley,

Robert D. Blumofe,

Paul R. WilsonAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 34, Issue 5

Pages 117 - 128

https://doi.org/10.1145/384264.379232

Published: 12 November 2000 Publication History

Abstract

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

References

[1]

U. Acar, E. Berger, R. Blumofe, and D. Papadopoulos. Hood: A threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood, Sept. 1999.

[2]

J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446-449, 1986.

[3]

bCandid.com, Inc. http://www.bcandid.com.

[4]

E. D. Berger and R. D. Blumofe. Hoard: A fast, scalable, and memory-efficient allocator for shared-memory multiprocessors. Technical Report UTCS-TR99-22, The University of Texas at Austin, 1999.

Digital Library

[5]

B. Bigler, S. Allan, and R. Oldehoeft. Parallel dynamic storage allocation. International Conference on Parallel Processing, pages 272-275, 1985.

[6]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356-368, Santa Fe, New Mexico, Nov. 1994.

Digital Library

[7]

Coyote Systems, Inc. http://www.coyotesystems.com.

[8]

C. S. Ellis and T. J. Olson. Algorithms for parallel memory allocation. International Journal of Parallel Programming, 17(4):303-345, 1988.

Digital Library

[9]

W. Gloger. Dynamic memory allocator implementations in linux system libraries. http://www.dent.med.uni-muenchen.de/ wmglo/malloc-slides.html.

[10]

A. Gottlieb and J. Wilson. Using the buddy system for concurrent memory allocation. Technical Report System Software Note 6, Courant Institute, 1981.

[11]

A. Gottlieb and J. Wilson. Parallelizing the usual buddy algorithm. Technical Report System Software Note 37, Courant Institute, 1982.

[12]

D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In R. Cartwright, editor, Proceedings of the Conference on Programming Language Design and Implementation, pages 177-186, New York, NY, USA, June 1993. ACM Press.

Digital Library

[13]

A. K. Iyengar. Dynamic Storage Allocation on a Multiprocessor. PhD thesis, MIT, 1992. MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-560.

Digital Library

[14]

A. K. Iyengar. Parallel dynamic storage allocation algorithms. In Fifth IEEE Symposium on Parallel and Distributed Processing. IEEE Press, 1993.

Digital Library

[15]

T. Jeremiassen and S. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In ACM Symposium on Principles and Practice of Parallel Programming, pages 179-188, July 1995.

Digital Library

[16]

T. Johnson. A concurrent fast-fits memory manager. Technical Report TR91-009, University of Florida, Department of CIS, 1991.

[17]

T. Johnson and T. Davis. Space efficient parallel buddy memory management. Technical Report TR92-008, University of Florida, Department of CIS, 1992.

[18]

M. S. Johnstone. Non-Compacting Memory Allocation and Real-Time Garbage Collection. PhD thesis, University of Texas at Austin, Dec. 1997.

Digital Library

[19]

M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, Vancouver, B.C., Canada, 1998.

Digital Library

[20]

K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the Sixth International Conference on Supercomputing, pages 323-334, Distributed Computing, July 1992.

Digital Library

[21]

M. R. Krishnan. Heap: Pleasures and pains. Microsoft Developer Newsletter, Feb. 1999.

[22]

P. Larson and M. Krishnan. Memory allocation for long-running server applications. In ISMM, Vancouver, B.C., Canada, 1998.

Digital Library

[23]

D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html.

[24]

B. Lewis. comp.programming.threads FAQ. http://www.lambdacs.com/newsgroup/FAQ.html.

[25]

P. E. McKenney and J. Slingwine. Efficient kernel memory allocation on shared-memory multiprocessor. In USENIX Association, editor, Proceedings of the Winter 1993 USENIX Conference: January 25-29, 1993, San Diego, California, USA, pages 295-305, Berkeley, CA, USA, Winter 1993. USENIX.

[26]

MicroQuill, Inc. http://www.microquill.com.

[27]

MySQL, Inc. The mysql database manager. http://www.mysql.org.

[28]

G. J. Narlikar and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1):138-173, January 1999.

Digital Library

[29]

J. M. Robson. Worst case fragmentation of first fit and best fit storage allocation strategies. ACM Computer Journal, 20(3):242-244, Aug. 1977.

[30]

SGI. The standard template library for c++: Allocators. http://www.sgi.com/Technology/STL/Allocators.html.

[31]

Standard Performance Evaluation Corporation. SPECweb99. http://www.spec.org/osg/web99/.

[32]

D. Stefanovi' c. Properties of Age-Based Automatic Memory Reclamation Algorithms. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, Dec. 1998.

Digital Library

[33]

D. Stein and D. Shah. Implementing lightweight threads. In Proceedings of the 1992 USENIX Summer Conference, pages 1-9, 1992.

[34]

H. Stone. Parallel memory allocation using the FETCH-AND-ADD instruction. Technical Report RC 9674, IBM T. J. Watson Research Center, Nov. 1982.

[35]

Time-Warner/AOL, Inc. AOLserver 3.0. http://www.aolserver.com.

[36]

J. Torrellas, M. S. Lam, and J. L. Hennessy. False sharing and spatial locality in multiprocessor caches. IEEE Transactions on Computers, 43(6):651-663, 1994.

Digital Library

[37]

V.-Y. Vee and W.-J. Hsu. A scalable and efficient storage allocator on shared-memory multiprocessors. In International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99), pages 230-235, Fremantle, Western Australia, June 1999.

Digital Library

Cited By

Langr DKocicka M(2020)Reducing the Impact of Intensive Dynamic Memory Allocations in Parallel Multi-Threaded ProgramsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.296051431:5(1152-1164)Online publication date: 1-May-2020
https://doi.org/10.1109/TPDS.2019.2960514
Torquati MMencagli GDrocco MAldinucci MDe Matteis TDanelutto M(2019)On dynamic memory allocation in sliding-window parallel patterns for streaming analyticsThe Journal of Supercomputing10.1007/s11227-017-2152-175:8(4114-4131)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s11227-017-2152-1
Versino DBronkhorst C(2018)A computationally efficient ductile damage model accounting for nucleation and micro-inertia at high triaxialitiesComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2018.01.028333(395-420)Online publication date: May-2018
https://doi.org/10.1016/j.cma.2018.01.028
Show More Cited By

Index Terms

Hoard: a scalable memory allocator for multithreaded applications

Recommendations

Hoard: a scalable memory allocator for multithreaded applications
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...
Hoard: a scalable memory allocator for multithreaded applications

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 34, Issue 5

Dec. 2000

269 pages

ISSN:0163-5980

DOI:10.1145/384264

Chairman:
William E. Weihl
Akami technologies, Inc., San Mateo, CA

Issue’s Table of Contents

ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
November 2000
271 pages
ISBN:1581133170
DOI:10.1145/378993
Chairmen:
Larry Rudolph
MIT, Cambridge, MA
,
Anoop Gupta
Microsoft

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2000

Published in SIGOPS Volume 34, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

311
Total Citations
View Citations
2,795
Total Downloads

Downloads (Last 12 months)453
Downloads (Last 6 weeks)80

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Langr DKocicka M(2020)Reducing the Impact of Intensive Dynamic Memory Allocations in Parallel Multi-Threaded ProgramsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.296051431:5(1152-1164)Online publication date: 1-May-2020
https://doi.org/10.1109/TPDS.2019.2960514
Torquati MMencagli GDrocco MAldinucci MDe Matteis TDanelutto M(2019)On dynamic memory allocation in sliding-window parallel patterns for streaming analyticsThe Journal of Supercomputing10.1007/s11227-017-2152-175:8(4114-4131)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s11227-017-2152-1
Versino DBronkhorst C(2018)A computationally efficient ductile damage model accounting for nucleation and micro-inertia at high triaxialitiesComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2018.01.028333(395-420)Online publication date: May-2018
https://doi.org/10.1016/j.cma.2018.01.028
Leidel J(2017)Bit Contiguous Memory Allocation for Processing In MemoryProceedings of the Workshop on Memory Centric Programming for HPC10.1145/3145617.3145618(11-19)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3145617.3145618
Chu DJaffar JMaghareh RKuo TWhalley D(2016)Symbolic execution for memory consumption analysisProceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems10.1145/2907950.2907955(62-71)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2907950.2907955
Danelutto MMencagli GTorquati M(2016)Efficient Dynamic Memory Allocation in Data Stream Processing Programs2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0181(1181-1188)Online publication date: Jul-2016
https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0181
Aigner MKirsch CLippautz MSokolova A(2015)Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structuresACM SIGPLAN Notices10.1145/2858965.281429450:10(451-469)Online publication date: 23-Oct-2015
https://dl.acm.org/doi/10.1145/2858965.2814294
Aigner MKirsch CLippautz MSokolova AAldrich JEugster P(2015)Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structuresProceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications10.1145/2814270.2814294(451-469)Online publication date: 23-Oct-2015
https://dl.acm.org/doi/10.1145/2814270.2814294
Klein FBeineke KSchottner M(2014)Memory management for billions of small objects in a distributed in-memory storage2014 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2014.6968771(113-122)Online publication date: Sep-2014
https://doi.org/10.1109/CLUSTER.2014.6968771
Moraru IAndersen DKaminsky MTolia NRanganathan PBinkert N(2013)Consistent, durable, and safe memory management for byte-addressable non volatile main memoryProceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems10.1145/2524211.2524216(1-17)Online publication date: 3-Nov-2013
https://dl.acm.org/doi/10.1145/2524211.2524216
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents