More Web Proxy on the site http://driver.im/

research-article

The locality-aware adaptive cache coherence protocol

Authors:

Srinivas DevadasAuthors Info & Claims

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 523 - 534

https://doi.org/10.1145/2485922.2485967

Published: 23 June 2013 Publication History

Abstract

Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.

References

[1]

DARPA UHPC Program BAA. https://www.fbo.gov/spg/ODA/DARPA/CMO/DARPA-BAA-10-37/listing.html, March 2010.

[2]

S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In International Solid-State Circuits Conference, 2008.

[3]

C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2008.

Digital Library

[4]

P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. In IEEE Micro, 30(2): 16--29, 2010.

Digital Library

[5]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In Int'l Symposium on Computer Architecture, 2009.

Digital Library

[6]

H. Hoffmann, D. Wentzlaff, and A. Agarwal. Remote Store Programming: A memory model for embedded multicore. In International Conference on High Performance Embedded Architectures and Compilers, 2010.

Digital Library

[7]

S. Iqbal, Y. Liang, and H. Grahn. ParMiBench - an open-source benchmark for embedded multiprocessor systems. Computer Architecture Letters, 2010.

Digital Library

[8]

A. Jaleel, E. Borch, M. Bhandaru, S. C. Steely Jr., and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Int'l Symposium on Microarchitecture, 2010.

Digital Library

[9]

N. E. Jerger, L.-S. Peh, and M. Lipasti. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In Int'l Symposium on Computer Architecture, 2008.

Digital Library

[10]

T. L. Johnson and W.-M. W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In Int'l Symposium on Computer architecture, 1997.

Digital Library

[11]

H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar. Near-threshold voltage (NTV) design - opportunities and challenges. In Design Automation Conference, 2012.

Digital Library

[12]

C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In Int'l Conference on Architectural Support for Programming Languages and Operating Systems, 2002.

Digital Library

[13]

G. Kurian, J. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. Kimerling, and A. Agarwal. ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2010.

Digital Library

[14]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Int'l Symposium on Microarchitecture, 2009.

Digital Library

[15]

H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Int'l Symposium on Microarchitecture, 2008.

Digital Library

[16]

M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, July 2012.

Digital Library

[17]

J. E. Miller, H. Kasture, G. Kurian, C. G. III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Int'l Symposium on High Performance Computer Architecture, 2010.

[18]

M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Int'l Symposium on Microarchitecture, 2006.

Digital Library

[19]

D. Sanchez and C. Kozyrakis. SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding. In Int'l Symposium on High Performance Computer Architecture, 2012.

Digital Library

[20]

C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic. DSENT - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Int'l Symposium on Networks-on-Chip, 2012.

Digital Library

[21]

G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In Int'l Symposium on Microarchitecture, 1995.

Digital Library

[22]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Int'l Symposium on Computer Architecture, 1995.

Digital Library

[23]

J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In Int'l Symposium on Microarchitecture, 2009.

Digital Library

[24]

M. Zhang and K. Asanović. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In Int'l Symposium on Computer Architecture, 2005.

Digital Library

[25]

H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan. SPATL: Honey, I Shrunk the Coherence Directory. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2011.

Digital Library

Cited By

Zhang AGoens AOswald NGrosser TSorin DNagarajan V(2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676889
Alsop JNa WSinclair MGrayson SAdve S(2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3530819
Li CJiang FChen SZhang JLiu YFu YXu JMitra TYoung EXiong J(2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549338
Show More Cited By

Index Terms

The locality-aware adaptive cache coherence protocol
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

The locality-aware adaptive cache coherence protocol
ICSA '13

Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in ...
Accelerating cache coherence mechanism with speculation
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Directory is one of the common method to maintain cache coherence in multi/many-core systems. However, directory has problems in area, latency and complexity of protocol. Conversely, directoryless coherence mechanism, where each core invalidates its own ...
An adaptive cache coherence protocol

This paper introduces a new adaptive cache coherence protocol which minimizes energy requirements and guarantees scalability. It includes two complementary parts: a non-inclusive sparse-directory to track only actively shared blocks and a structure to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

June 2013

686 pages

ISBN:9781450320795

DOI:10.1145/2485922

General Chair:
Avi Mendelson
Technion

ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

IEEE CS

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISCA'13

Sponsor:

ISCA'13: The 40th Annual International Symposium on Computer Architecture

June 23 - 27, 2013

Tel-Aviv, Israel

Acceptance Rates

ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
940
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang AGoens AOswald NGrosser TSorin DNagarajan V(2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676889
Alsop JNa WSinclair MGrayson SAdve S(2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3530819
Li CJiang FChen SZhang JLiu YFu YXu JMitra TYoung EXiong J(2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549338
Ibrahim MKayiran OEckert YLoh GJog ASarkar VKim H(2020)Analyzing and Leveraging Shared L1 Caches in GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414623(161-173)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3410463.3414623
Kumar CSingh SByrd G(2019)Hybrid Remote Access ProtocolIEEE Computer Architecture Letters10.1109/LCA.2019.289611618:1(30-33)Online publication date: 1-Jan-2019
https://doi.org/10.1109/LCA.2019.2896116
Liu YSun X(2018)CaL: Extending Data Locality to Consider Concurrency for Performance OptimizationIEEE Transactions on Big Data10.1109/TBDATA.2017.27538254:2(273-288)Online publication date: 1-Jun-2018
https://doi.org/10.1109/TBDATA.2017.2753825
Shukla SChaudhuri M(2017)Sharing-Aware Efficient Private Caching in Many-Core Server Processors2017 IEEE International Conference on Computer Design (ICCD)10.1109/ICCD.2017.85(485-492)Online publication date: Nov-2017
https://doi.org/10.1109/ICCD.2017.85
Sridharan ASeznec A(2017)Dynamic and discrete cache insertion policies for managing shared last level caches in large multicoresJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.02.004106:C(215-226)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1016/j.jpdc.2017.02.004
Shi QKurian GHijaz FDevadas SKhan O(2016)LDACACM Transactions on Architecture and Code Optimization10.1145/298363213:4(1-28)Online publication date: 15-Nov-2016
https://dl.acm.org/doi/10.1145/2983632
Khoshavi NXunchao Chen Jun Wang DeMara R(2016)Bit-Upset Vulnerability Factor for eDRAM Last Level Cache immunity analysis2016 17th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2016.7479148(6-11)Online publication date: Mar-2016
https://doi.org/10.1109/ISQED.2016.7479148
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten