More Web Proxy on the site http://driver.im/

Article

Scalable locality-conscious multithreaded memory allocation

Authors:

Scott Schneider,

Christos D. Antonopoulos,

Dimitrios S. NikolopoulosAuthors Info & Claims

ISMM '06: Proceedings of the 5th international symposium on Memory management

Pages 84 - 94

https://doi.org/10.1145/1133956.1133968

Published: 10 June 2006 Publication History

Abstract

We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance memory allocation while transparently favoring locality. Streamflow enables low over-head simultaneous allocation by multiple threads and adapts to sequential allocation at speeds comparable to that of custom sequential allocators. It favors the transparent exploitation of temporal and spatial object access locality, and reduces allocator-induced cache conflicts and false sharing, all using a unified design based on segregated heaps. Streamflow introduces an innovative design which uses only synchronization-free operations in the most common case of local allocations and deallocations, while requiring minimal, non-blocking synchronization in the less common case of remote deallocations. Spatial locality at the cache and page level is favoredby eliminating small objects headers, reducing allocator-induced conflicts via contiguous allocation of page blocks in physical memory, reducing allocator-induced false sharing by using segregated heaps and achieving better TLB performance and fewer page faults via the use of superpages. Combining these locality optimizations with the drastic reduction of synchronization and latency overhead allows Streamflow to perform comparably with optimized sequential allocators and outperform--on a shared-memory systemwith four two-way SMT processors--four state-of-the-art multi-processor allocators by sizeable margins in our experiments. The allocation-intensive sequential and parallel benchmarks used in our experiments represent a variety of behaviors, including mostly local object allocation-deallocation patterns and producer-consumer allocation-deallocation patterns.

References

[1]

N. Arora, R. Blumofe, and C. Greg-Plaxton. Thread Scheduling for Multiprogrammed Multiprocessors. In Proc. of the 10th ACM Symposium on Parallel Algorithms and Architectures, pages 119--129, Puerto Vallarta, Mexico, June 1998.

Digital Library

[2]

D. Barrett and B. Zorn. Using Lifetime Predictors to Improve Memory Allocation Performance. In Proc. of the 1993 ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 187--196, June 1993.

Digital Library

[3]

E. Berger, K. Mckinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, Cambridge, MA, November 2000.

Digital Library

[4]

E. Berger, B. Zorn, and K. McKinley. Reconsidering Custom Memory Allocation. In Proc. of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applpications, pages 1--12, Seattle, WA, November 2002.

Digital Library

[5]

Filip Blagojevic. Optimizing Irregular Adaptive Application on Multi-Threaded Processors: The Case of Medium-Grain Parallel Delaunay Mesh Generation. Master's thesis, The College of William and Mary, Williamsburg, VA, U.S.A., December 2005.

[6]

C. Cascaval, E. Duesterwald, P. Sweeney, and R. Wisniewski. Multiple Page Size Modeling and Optimization. In Proc. of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 339--349, Saint Louis, MO, September 2005.

Digital Library

[7]

Y. Feng and E. Berger. A Locality-Improving Dynamic Memory Allocator. In Proceedings of the Third Annual ACM SIGPLAN Workshop on Memory Systems Performance, Chicago, IL, June 2005.

Digital Library

[8]

D. Gay and A. Aiken. Memory Management with Explicit Regions. In Proc. of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 313--323, Montreal, Canada, June 1998.

Digital Library

[9]

Wolfram Gloger. Dynamic Memory Allocator Implementations in Linux System Libraries. http://www.dent.med.uni-muenchen.de/ wmglo/malloc-slides.html.

[10]

Google. Google Performance Tools. http://goog-perftools.sourceforge.net/.

[11]

D. Grunwald, B. Zorn, and R. Henderson. Improving the Cache Locality of Memory Allocation. In Proc. of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, pages 177--186, Albuquerque, NM, June 1993.

Digital Library

[12]

P. Kamp. Malloc(3) Revisted. http://phk.freebsd.dk/pubs/malloc.pdf.

Digital Library

[13]

K. C. Knowlton. A Fast Storage Allocator. Communications of the ACM, 8(10):623--625, 1965.

Digital Library

[14]

D. E. Knuth. Dynamic Storage Allocation. In The Art of Computer Programming, volume 1. Addison-Wesley, 1968.

Digital Library

[15]

P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In Proceedings of the First International Symposium on Memory Management, pages 176--185, Vancouver, BC, October 1998.

Digital Library

[16]

D. Lea. A Memory Allocator. http://gee.cs.oswego.edu/dl/html/malloc.html.

[17]

L. McDowell, S. Eggers, and S. Gribble. Improving Server Software Support for Simultaneous Multithreaded Processors. In Proc. of the 2003 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 37--48, San Diego, CA, June 2003.

Digital Library

[18]

M. Michael. Scalable Lock-free Dynamic Memory Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35--46, Washington, DC, June 2004.

Digital Library

[19]

J. Navarro, S. Iyer, and A. Cox. Practical, Transparent Operating System Support for Superpages. In Proc. of the Fifth Symposiumon Operating Systems Design and Implementation, pages 89--104, Boston, MA, December 2002.

Digital Library

[20]

T. Romer, W. Ohlrich, A. Karlin, and B. Bershad. Reducing TLB and Memory Overhead using Online Superpage Promotion. In Proc. of the 22nd International Symposium on Computer Architecture, pages 176--187, Santa Margherita Ligure, Italy, June 1995.

Digital Library

[21]

D. Ross. The AED Free Storage Package. Communications of the ACM, 10(8):481--492, 1967.

Digital Library

[22]

M. Seidl and B. Zorn. Segregating Heap Objects by Reference Behavior and Lifetime. In Proc. of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, San Jose, CA, October 1998.

Digital Library

[23]

Y. Shuf, M. Gupta, R. Bordawekar, and J. Pal Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proc.of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lanugages, pages 295--306, Portland, OR, January 2002.

Digital Library

[24]

G. Steele. Data Representation in PDP-10 MACLISP. Technical Report AI Lab Memo 421, MIT, 1977.

[25]

V. Vee and W. Hsu. A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors. In Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks, pages 230--235, Perth, Australia, June 1999.

Digital Library

[26]

K. Vo. Vmalloc: A General and Efficient Memory Allocator. Software Practice and Experience, 26(3):357--374, 1996.

[27]

P. Wilson, M. Johnstone, M. Neely, and D. Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proc. of the International Workshop on Memory Management, LNCS Vol. 986, pages 1--116, Kinross, UK, September 1995.

Digital Library

Cited By

Dang ZHe SZhang XHong PLi ZChen XSong HSun XChen G(2024)PMAlloc: A Holistic Approach to Improving Persistent Memory AllocationACM Transactions on Computer Systems10.1145/364388642:3-4(1-52)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3643886
Yan WZhang X(2024) Improving the Relationship Between B + -Tree and Memory Allocator for Persistent Memory 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00299(3906-3919)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00299
Yang HZhao XZhou JWang WKundu SWu BGuan HLiu TBlackburn SPetrank E(2023)NUMAlloc: A Faster NUMA Memory AllocatorProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595276(97-110)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595276
Show More Cited By

Recommendations

Automatic memory reclamation for lock-free data structures
OOPSLA '15

Lock-free data-structures are widely employed in practice, yet designing lock-free memory reclamation for them is notoriously difficult. In particular, all known lock-free reclamation schemes are ``manual'' in the sense that the developer has to ...
Cache-Conscious Thread Scheduling for Massively Multithreaded Processors

Highly multithreaded architectures introduce another dimension to fine-grained hardware cache management. The order in which the system's threads issue instructions can significantly impact the access stream seen by the caching system. This article ...
Making LRU Friendly to Weak Locality Workloads: A Novel Replacement Algorithm to Improve Buffer Cache Performance

Although the LRU replacement algorithm has been widely used in buffer cache management, it is well-known for its inability to cope with access patterns with weak locality. Previously proposed algorithms to improve LRU greatly increase complexity and/or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISMM '06: Proceedings of the 5th international symposium on Memory management

June 2006

202 pages

ISBN:1595932216

DOI:10.1145/1133956

General Chair:
Erez Petrank
Technion - Israel Institute of Technology
,
Program Chair:
Eliot Moss
University of Massachusetts, Amherst

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ISMM06

Sponsor:

ISMM06: The 2006 International Symposium on Memory Management

June 10 - 11, 2006

Ontario, Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

97
Total Citations
View Citations
965
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)18

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dang ZHe SZhang XHong PLi ZChen XSong HSun XChen G(2024)PMAlloc: A Holistic Approach to Improving Persistent Memory AllocationACM Transactions on Computer Systems10.1145/364388642:3-4(1-52)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3643886
Yan WZhang X(2024) Improving the Relationship Between B + -Tree and Memory Allocator for Persistent Memory 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00299(3906-3919)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00299
Yang HZhao XZhou JWang WKundu SWu BGuan HLiu TBlackburn SPetrank E(2023)NUMAlloc: A Faster NUMA Memory AllocatorProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595276(97-110)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595276
Hadjadj YZouaoui CTaleb NMazari SEl Bahri MEl Mezouar M(2023)VCMalloc: A Virtually Contiguous Memory AllocatorIEEE Transactions on Computers10.1109/TC.2023.330273172:12(3431-3442)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TC.2023.3302731
Dang ZHe SHong PLi ZZhang XSun XChen GFalsafi BFerdman MLu SWenisch T(2022)NVAlloc: rethinking heap metadata management in persistent memory allocatorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507743(115-127)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507743
Chen XDathathri RGill GPingali K(2020)PangolinProceedings of the VLDB Endowment10.14778/3389133.338913713:8(1190-1205)Online publication date: 3-May-2020
https://dl.acm.org/doi/10.14778/3389133.3389137
Sivaramakrishnan KDolan SWhite LJaffer SKelly TSahoo AParimala SDhiman AMadhavapeddy A(2020)Retrofitting parallelism onto OCamlProceedings of the ACM on Programming Languages10.1145/34089954:ICFP(1-30)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3408995
Langr DKocicka M(2020)Reducing the Impact of Intensive Dynamic Memory Allocations in Parallel Multi-Threaded ProgramsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.296051431:5(1152-1164)Online publication date: 1-May-2020
https://doi.org/10.1109/TPDS.2019.2960514
Schildermans SAerts KShan JDing X(2020)Ptlbmalloc2: Reducing TLB Shootdowns with High Memory Efficiency2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00036(76-83)Online publication date: Dec-2020
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00036
Das RTsai SDuppala SLynch JArkin EChowdhury RMitchell JSkiena SScheideler CBerenbrink P(2019)Data Races and the Discrete Resource-time Tradeoff Problem with Resource Reuse over PathsThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323209(359-368)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3323165.3323209
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents