[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2048066.2048092acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Why nothing matters: the impact of zeroing

Published: 22 October 2011 Publication History

Abstract

Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtual machines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.

References

[1]
AMD. Using the x86 Open64 Compiler Suite. Advanced Micro Devices, 2011. URL http://developer.amd.com/assets/x86_open64_user_guide.pdf.
[2]
S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Tucson, AZ, PLDI '08, pages 22--32, June 2008.
[3]
S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/steveb/research/research-infrastructure/pjbb2005.
[4]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In Proceedings of the 2004 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, New York, NY, SIGMETRICS-Performance '04, pages 25--36, June 2004.
[5]
S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proceedings of the International Conference on Software Engineering, Edinburgh, UK, ICSE '04, pages 137--146, May 2004.
[6]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Portland, OR, OOPSLA '06, pages 169--190, Oct. 2006.
[7]
S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century. Communications of the ACM, 51 (8): 83--89, Aug. 2008.
[8]
S. Borkar and A. A. Chien. The future of microprocessors. Communications of the ACM, 54 (5): 67--77, May 2011.
[9]
D. Burger, J. R. Goodman, and A. K\"agi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer architecture, Philadelphia, PA, ISCA '96, pages 78--89, May 1996.
[10]
C. Click. Azul's experiences with hardware/software co-design. Keynote at ECOOP '09, July 2009.
[11]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro, 30 (2): 16 --29, March--April 2010. ISSN 0272--1732.
[12]
D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In Proceedings of the 4th International Symposium on Memory Management, Vancouver, BC, ISMM '04, pages 37--48, Oct. 2004.
[13]
GNU. GNU C Library. Free Software Foundation, 2011. URL http://www.gnu.org/software/libc/manual/.
[14]
N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for server and middleware applications. In Proceedings of the 3rd Virtual Machine Research and Technology, San Jose, CA, VM'04, pages 12--12, May 2004.
[15]
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, Seattle, WA, PACT '06, pages 13--22, Sept. 2006.
[16]
H. Inoue, H. Komatsu, and T. Nakatani. A study of memory management for web-based applications on multicore processors. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Dublin, Ireland, PLDI '09, pages 386--396, June 2009.
[17]
Intel. MMX Technology Developer's Guide. Intel Corporation, Mar. 1996. URL ftp://download.intel.com/ids/mmx/MMX_Manual_Tech_Developers_Guide.pdf.
[18]
Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation, Apr. 2011. Order Number 248966-024.
[19]
Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 2: Instruction Set Reference, A-Z. Intel Corporation, May 2011. Order Number 325383-039US.
[20]
Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 3: Systems Programming Guide. Intel Corporation, May 2011. Order Number 325384-039US.
[21]
N. P. Jouppi. Cache write policies and performance. In Proceedings of the 20th Annual International Symposium on Computer architecture, San Diego, CA, ISCA '93, pages 191--201, May 1993.
[22]
R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30 (2): 7--15, March--April 2010. ISSN 0272--1732.
[23]
P. B. Kessler. Java HotSpot virtual machine. Talk at FOSDEM-2007, Feb. 2007.
[24]
C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, Bangalore, India, HPCA-10, pages 176--185, Feb. 2004.
[25]
D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, PACT '09, pages 261--270, Sept. 2009.
[26]
G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Languages Design and Implementation, San Diego, CA, PLDI '07, pages 1--11, June 2007.
[27]
Oracle Corporation. Java bug 6977804: G1:remove the zero-filling thread. URL http://bugs.sun.com/view_bug.do?bug_id=6977804.
[28]
B. Rogers, A. Krishna, G. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. In Proceedings of the 36th Annual International Symposium on Computer architecture, Austin, TX, ISCA '09, pages 371--382, June 2009.
[29]
Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams. URL https://issues.apache.org/jira/browse/LUCENE-1800.
[30]
E. Sikha, R. Simpson, C. May, and H. Warren. The PowerPC Architecture: A Specification for a New Family of RISC Processors. Morgan Kaufmann Publishers, 1994.
[31]
SPEC. SPECjvm98, Release 1.03. Standard Performance Evaluation Corporation, Mar. 1999. URL http://www.spec.org/jvm98.
[32]
SPEC. SPECjbb2005 (Java Server Benchmark), Release 1.07. Standard Performance Evaluation Corporation, 2006. URL http://www.spec.org/jbb2005.
[33]
C. Yu and P. Petrov. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, DAC '10, pages 132--137, June 2010.
[34]
Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocation wall: A limiting factor of Java applications on emerging multi-core platforms. In Proceedings of the 21st ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Orlando, FL, OOPSLA '09, pages 361--376, 2009.

Cited By

View all
  • (2022)Low-latency, high-throughput garbage collectionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523440(76-91)Online publication date: 9-Jun-2022
  • (2020)Efficient nursery sizing for managed languages on multi-core processors with shared cachesProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377908(1-15)Online publication date: 22-Feb-2020
  • (2020)A Novel Approach of Data Content Zeroization Under Memory AttacksJournal of Electronic Testing10.1007/s10836-020-05867-4Online publication date: 12-May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
October 2011
1104 pages
ISBN:9781450309400
DOI:10.1145/2048066
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 10
    OOPSLA '11
    October 2011
    1063 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2076021
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory safety
  2. zero initialization

Qualifiers

  • Research-article

Conference

SPLASH '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)6
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Low-latency, high-throughput garbage collectionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523440(76-91)Online publication date: 9-Jun-2022
  • (2020)Efficient nursery sizing for managed languages on multi-core processors with shared cachesProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377908(1-15)Online publication date: 22-Feb-2020
  • (2020)A Novel Approach of Data Content Zeroization Under Memory AttacksJournal of Electronic Testing10.1007/s10836-020-05867-4Online publication date: 12-May-2020
  • (2019)Crystal GazerProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3322205.33110803:1(1-27)Online publication date: 26-Mar-2019
  • (2019)Emulating and Evaluating Hybrid Memory for Managed Languages on NUMA Hardware2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00017(93-105)Online publication date: Mar-2019
  • (2018)Hardware-software co-optimization of memory management in dynamic languagesACM SIGPLAN Notices10.1145/3299706.321056653:5(45-58)Online publication date: 18-Jun-2018
  • (2018)Write-rationing garbage collection for hybrid memoriesACM SIGPLAN Notices10.1145/3296979.319239253:4(62-77)Online publication date: 11-Jun-2018
  • (2018)Hardware-software co-optimization of memory management in dynamic languagesProceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management10.1145/3210563.3210566(45-58)Online publication date: 18-Jun-2018
  • (2018)Write-rationing garbage collection for hybrid memoriesProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192392(62-77)Online publication date: 11-Jun-2018
  • (2017)Instrumentation bias for dynamic data race detectionProceedings of the ACM on Programming Languages10.1145/31338931:OOPSLA(1-31)Online publication date: 12-Oct-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media