[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2485922.2485967acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

The locality-aware adaptive cache coherence protocol

Published: 23 June 2013 Publication History

Abstract

Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.

References

[1]
DARPA UHPC Program BAA. https://www.fbo.gov/spg/ODA/DARPA/CMO/DARPA-BAA-10-37/listing.html, March 2010.
[2]
S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In International Solid-State Circuits Conference, 2008.
[3]
C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2008.
[4]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. In IEEE Micro, 30(2): 16--29, 2010.
[5]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In Int'l Symposium on Computer Architecture, 2009.
[6]
H. Hoffmann, D. Wentzlaff, and A. Agarwal. Remote Store Programming: A memory model for embedded multicore. In International Conference on High Performance Embedded Architectures and Compilers, 2010.
[7]
S. Iqbal, Y. Liang, and H. Grahn. ParMiBench - an open-source benchmark for embedded multiprocessor systems. Computer Architecture Letters, 2010.
[8]
A. Jaleel, E. Borch, M. Bhandaru, S. C. Steely Jr., and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Int'l Symposium on Microarchitecture, 2010.
[9]
N. E. Jerger, L.-S. Peh, and M. Lipasti. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In Int'l Symposium on Computer Architecture, 2008.
[10]
T. L. Johnson and W.-M. W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In Int'l Symposium on Computer architecture, 1997.
[11]
H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar. Near-threshold voltage (NTV) design - opportunities and challenges. In Design Automation Conference, 2012.
[12]
C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In Int'l Conference on Architectural Support for Programming Languages and Operating Systems, 2002.
[13]
G. Kurian, J. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. Kimerling, and A. Agarwal. ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2010.
[14]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Int'l Symposium on Microarchitecture, 2009.
[15]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Int'l Symposium on Microarchitecture, 2008.
[16]
M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, July 2012.
[17]
J. E. Miller, H. Kasture, G. Kurian, C. G. III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Int'l Symposium on High Performance Computer Architecture, 2010.
[18]
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Int'l Symposium on Microarchitecture, 2006.
[19]
D. Sanchez and C. Kozyrakis. SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding. In Int'l Symposium on High Performance Computer Architecture, 2012.
[20]
C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic. DSENT - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Int'l Symposium on Networks-on-Chip, 2012.
[21]
G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In Int'l Symposium on Microarchitecture, 1995.
[22]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Int'l Symposium on Computer Architecture, 1995.
[23]
J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In Int'l Symposium on Microarchitecture, 2009.
[24]
M. Zhang and K. Asanović. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In Int'l Symposium on Computer Architecture, 2005.
[25]
H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan. SPATL: Honey, I Shrunk the Coherence Directory. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2011.

Cited By

View all
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • IEEE CS

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache coherence
  2. multicore

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'13
Sponsor:

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
  • (2020)Analyzing and Leveraging Shared L1 Caches in GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414623(161-173)Online publication date: 30-Sep-2020
  • (2019)Hybrid Remote Access ProtocolIEEE Computer Architecture Letters10.1109/LCA.2019.289611618:1(30-33)Online publication date: 1-Jan-2019
  • (2018)CaL: Extending Data Locality to Consider Concurrency for Performance OptimizationIEEE Transactions on Big Data10.1109/TBDATA.2017.27538254:2(273-288)Online publication date: 1-Jun-2018
  • (2017)Sharing-Aware Efficient Private Caching in Many-Core Server Processors2017 IEEE International Conference on Computer Design (ICCD)10.1109/ICCD.2017.85(485-492)Online publication date: Nov-2017
  • (2017)Dynamic and discrete cache insertion policies for managing shared last level caches in large multicoresJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.02.004106:C(215-226)Online publication date: 1-Aug-2017
  • (2016)LDACACM Transactions on Architecture and Code Optimization10.1145/298363213:4(1-28)Online publication date: 15-Nov-2016
  • (2016)Bit-Upset Vulnerability Factor for eDRAM Last Level Cache immunity analysis2016 17th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2016.7479148(6-11)Online publication date: Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media