[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2132325.2132356acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Improving shared cache behavior of multithreaded object-oriented applications in multicores

Published: 07 November 2011 Publication History

Abstract

Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.

References

[1]
S. M. Blackburn et al. Myths and Realities: The Performance Impact of Garbage Collection. In SIGMETRICS, pages 25--36, 2004.
[2]
B. Calder et al. Cache-Conscious Data Placement. In ASPLOS, pages 139--149, 1998.
[3]
D. Chandra et al. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In HPCA, pages 340--351, 2005.
[4]
J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In ICS, pages 242--252, 2007.
[5]
G. Chen et al. PennBench: A Benchmark Suite for Embedded Java. In WWC, 2005.
[6]
T. M. Chilimbi and J. R. Larus. Using generational garbage collection to implement cache-conscious data placement. In ISMM, pages 37--48, 1998.
[7]
T. M. Chilimbi et al. Cache-conscious structure layout. In PLDI, pages 1--12, 1999.
[8]
S. Cho and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation. In MICRO, pages 455--468, 2006.
[9]
J. Dolby. Automatic inline allocation of objects. In PLDI, pages 7--17, 1997.
[10]
J. Dolby and A. Chien. An automatic object inlining optimization and its evaluation. In PLDI, pages 345--357, 2000.
[11]
J. Dolby and A. A. Chien. An evaluation of automatic object inline allocation techniques. In OOPSLA, pages 1--20, 1998.
[12]
A. Fedorova et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In PACT, pages 25--38, 2007.
[13]
P. P. Gelsinger. Intel architecture press briefing, http://download.intel.com/pressroom/archive/reference/Gelsinger_briefing_0308.pdf, 2008.
[14]
F. Guo et al. A Framework for Providing Quality of Service in Chip Multi-Processors. In MICRO, pages 343--355, 2007.
[15]
J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4): 1--17, 2006.
[16]
Intel. Platform 2015: Intel Processor and Platform Evolution for the Next Decade. http://epic.hpi.uni-potsdam.de/pub/Home/TrendsAndConceptsII2010/HW_Tren%ds_borkar_2015.pdf, 2005.
[17]
R. R. Iyer et al. Exploring small-scale and large-scale cmp architectures for commercial Java servers. In IISWC, pages 191--200, 2006.
[18]
Y. Jiang et al. Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors. In PACT, pages 220--229, 2008.
[19]
M. Kandemir et al. Optimizing Shared Cache Behavior of Chip multiprocessors In MICRO, 2009.
[20]
M. Kandemir et al. Cache topology aware computation mapping for multicores. In PLDI, 2010.
[21]
S. Kim et al. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In PACT, pages 111--122, 2004.
[22]
P. S. Magnusson et al. Simics: A Full System Simulation Platform. Computer, 35 (2):50--58, 2002.
[23]
M. M. K. Martin et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005.
[24]
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In MICRO, pages 423--432, 2006.
[25]
N. Rafique et al. Architectural Support for Operating System-Driven CMP Cache Management. In PACT, pages 2--12, 2006.
[26]
M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In ASPLOS, pages 12--23, 1998.
[27]
Y. Shuf et al. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In OOPSLA, pages 13--25, 2002.
[28]
L. Soares et al. Reducing the Harmful Effects of Last-Level Cache Polluters with an OS-Level, Software-Only Pollute Buffer. In MICRO, pages 258--269, 2008.
[29]
G. E. Suh et al. Dynamic Partitioning of Shared Cache Memory. J. Supercomput., 28(1):7--26, 2004.
[30]
K. Tian et al. A Study on Optimally Co-Scheduling Jobs of Different Lengths on Chip Multiprocessors. In CF, pages 41--50, 2009.
[31]
F. Xian et al. Microphase: an approach to proactively invoking garbage collection for improved performance. In OOPSLA, pages 77--96, 2007.
[32]
E. Yardimci and D. Kaeli. Profile-guided tuning of heap-based memory access. In WMPI, 2001.
[33]
Y. Zhao et al. Allocation Wall: A Limiting Factor of Java Applications on Emerging Multi-Core Platforms. In OOPSLA, pages 361--376, 2009.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '11: Proceedings of the International Conference on Computer-Aided Design
November 2011
844 pages
ISBN:9781457713989
  • General Chair:
  • Joel Phillips,
  • Program Chairs:
  • Alan J. Hu,
  • Helmut Graeb

Sponsors

Publisher

IEEE Press

Publication History

Published: 07 November 2011

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 91
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media