More Web Proxy on the site http://driver.im/

article

Coherence decoupling: making use of incoherence

Authors:

Gurindar S. SohiAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 38, Issue 5

Pages 97 - 106

https://doi.org/10.1145/1037949.1024406

Published: 07 October 2004 Publication History

Abstract

This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a speculative load value, typically from an invalid cache line, permitting the processor to compute with incoherent data. In parallel, the coherence protocol obtains the necessary coherence permissions and the correct value. Eventually, the speculative use of the incoherent data can be verified against the coherent data. Thus, coherence decoupling can greatly reduce --- if not eliminate --- the effects of false sharing. Furthermore, coherence decoupling can also reduce latencies incurred by true sharing. SCL protocols reduce those latencies by speculatively writing updates into invalid lines, thereby increasing the accuracy of speculation, without complicating the simple, underlying coherence protocol that guarantees correctness.The performance benefits of coherence decoupling are evaluated using a full-system simulator and a mix of commercial and scientific benchmarks. Our results show that 40% to 90% of all coherence misses can be speculated correctly, and therefore their latencies partially or fully hidden. This capability results in performance improvements ranging from 3% to over 16%, in most cases where the latencies of coherence misses have an effect on performance.

References

[1]

A. Alameldeen and D. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the 9th Int. Symp. on High-Performance Computer Architecture, pages 7--18, Feb. 2003.

Digital Library

[2]

C. Anderson and A. Karlin. Two adaptive hybrid cache coherency protocols. In Proceedings of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 303--313, Feb. 1996.

Digital Library

[3]

A. L. Cox and R. J. Fowler. Adaptive cache coherency for detecting migratory shared data. In Proceedings of the 20th Int. Symp. on Computer Architecture, pages 98--108, May 1993.

Digital Library

[4]

F. Dahlgren. Boosting the performance of hybrid snooping cache protocols. In Proceedings of the 22nd Int. Symp. on Computer Architecture, pages 60--69, June 1995.

Digital Library

[5]

F. Dahlgren, M. Dubois, and P. Stenstr. om. Combined performance gains of simple cache protocol extensions. In Proceedings of the 21st Int. Symp. on Computer Architecture, pages 187--197, Apr. 1994.

Digital Library

[6]

M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstrom. The detection and elimination of useless misses in multiprocessors. In Proceedings of the 20th Int. Symp. on Computer Architecture, pages 88--97, May 1993.

Digital Library

[7]

B. Falsa, A. R. Lebeck, S. K. Reinhardt, I. Schoinas, M. D. Hill, J. R. Larus, A. Rogers, and D. A. Wood. Application-specific protocols for user-level shared memory. In Supercomputing, pages 380--389, Nov. 1994.

Digital Library

[8]

M. Franklin and G. S. Sohi. ARB: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(5):552--571, May 1996.

Digital Library

[9]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory. In Proceedings of the 17th Int. Symp. on Computer Architecture, pages 15--26, May 1990.

Digital Library

[10]

P. B. Gibbons, M. Merritt, and K. Gharachorloo. Proving sequential consistency of high-performance shared memories. In Proceedings of the Third ACM Symp. on Parallel Algorithms and Architectures, pages 292--303, July 1991.

Digital Library

[11]

P. N. Glaskowsky. IBM Raises Curtain on Power5. Microprocessor Report, Oct. 14 2003.

[12]

K. Gniady, B. Falsa, and T. Vijaykumar. Is SC + ILP = RC? In Proceedings of the 26th Int. Symp. on Computer Architecture, pages 162--171, May 1999.

Digital Library

[13]

S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In Proceedings of the The 4th Int. Symp. on High-Performance Computer Architecture, pages 195--205, Feb. 1998.

Digital Library

[14]

M. D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: software and hardware for scalable multiprocessors. ACM Transactions on Computer Systems, 11(4):300--318, 1993.

Digital Library

[15]

IEEE. IEEE Standard for Scalable Coherent Interface (SCI), 1992. IEEE 1596--1992.

[16]

T. Karkhanis and J. Smith. A day in the life of a cache miss. In Proceedings of 2nd Annual Workshop On Memory Performance Issues, May 2002.

[17]

S. Kaxiras and J. R. Goodman. Improving CC-NUMA performance using instruction-based prediction. In Proceedings of the 5th Int. Symp. on High Performance Computer Architecture, pages 161--170, Jan. 1999.

Digital Library

[18]

S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In Proceedings of the 6th Int. Symp. on High Performance Computer Architecture, pages 156--167, Feb. 2000.

[19]

J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proceedings of the 21st Int. Symp. on Computer Architecture, pages 302--313, Apr. 1994.

Digital Library

[20]

A.-C. Lai and B. Falsa. Memory sharing predictor: The key to a speculative coherent DSM. In Proceedings of the 26th Int. Symp. on Computer Architecture, pages 172 -- 183, May 1999.

Digital Library

[21]

A.-C. Lai and B. Falsa. Selective, accurate, and timely self-invalidation using last-touch prediction. In Proceedings of the 27th Int. Symp. on Computer Architecture, pages 139--148, June 2000.

Digital Library

[22]

A. R. Lebeck and D. A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In Proceedings of the 22nd Int. Symp. on Computer Architecture, pages 48--59, June 1995.

Digital Library

[23]

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63--79, Mar. 1992.

Digital Library

[24]

K. M. Lepak and M. H. Lipasti. Silent stores for free. In Proceedings of the 33rd Int. Symp. on Microarchitecture, pages 2231, Dec. 2000.

Digital Library

[25]

K. M. Lepak and M. H. Lipasti. Temporally silent stores. In Proceedings of the 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 30--41, Oct. 2002.

Digital Library

[26]

M. M. K. Martin, P. J. Harper, D. J. Sorin, M. D. Hill, and D. A. Wood. Using destination-set prediction to improve the latency/bandwidth tradeo in shared memory multiprocessors. In Proceedings of the 30th Int. Symp. on Computer Architecture, pages 206--217, June 2003.

Digital Library

[27]

M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In Proceedings of the 30th Int. Symp. on Computer Architecture, pages 182--193, June 2003.

Digital Library

[28]

M. M. K. Martin, D. J. Sorin, H. W. Cain, M. D. Hill, and M. H. Lipasti. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the 34th Int. Symp. on Microarchitecture, pages 328--337, Dec. 2001.

Digital Library

[29]

J. F. Martinez and J. Torrellas. Speculative synchronization: Applying thread-level speculation to explicitly parallel applications. In Proceedings of the 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 18--29, Oct. 2002.

Digital Library

[30]

A. I. Moshovos, S. E. Breach, T. Vijaykumar, and G. S. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th Int. Symp. on Computer Architecture, pages 181--193, June 1997.

Digital Library

[31]

F. Mounes-Toussi and D. J. Lilja. The potential of compile-time analysis to adapt the cache coherence enforcement strategy to the data sharing characteristics. IEEE Transactions on Parallel and Distributed Systems, 6(5):470--481, May 1995.

Digital Library

[32]

S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Int. Symp. on Computer Architecture, pages 179--190, June 1998.

Digital Library

[33]

V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton. An evaluation of memory consistency models for shared-memory systems with ILP processors. In Proceedings of the 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 12--23, Oct. 1996.

Digital Library

[34]

Y. N. Patt, W. M. Hwu, and M. Shebanow. HPS, a New Microarchitecture: Rationale and Introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, 1985.

Digital Library

[35]

R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Int. Symp. on Microarchitecture, pages 294--305, Dec. 2001.

Digital Library

[36]

R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In Proceedings of the 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 5--17, Oct. 2002.

Digital Library

[37]

A. Raynaud, Z. Zhang, and J. Torrellas. Distance-adaptive update protocols for scalable shared-memory multiprocessors. In Proceedings of the 2nd Int. Symp. on High-Performance Computer Architecture, pages 323--334, Feb. 1996.

Digital Library

[38]

S. K. Reinhardt, J. R. Larus, and D. A. Wood. Tempest and Typhoon: User-level Shared Memory. In Proceedings of the 21st Int. Symp. on Computer Architecture, pages 325--336, Apr. 1994.

Digital Library

[39]

K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore. Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture. In Proceedings of the 30th Int. Symp. on Computer Architecture, pages 422--433, June 2003.

Digital Library

[40]

G. S. Sohi. Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Transaction of Computer, 39(3):349--359, 1990.

Digital Library

[41]

G. S. Sohi, S. Breach, and T. Vijaykumar. Multiscalar processors. In Proceedings of the 22th Int. Symp. on Computer Architecture, pages 414--425, June 1995.

Digital Library

[42]

P. Stenstr. om, M. Brorsson, and L. Sandberg. Adaptive cache coherence protocol optimized for migratory sharing. In Proceedings of the 20th Int. Symp. on Computer Architecture, pages 109--118, May 1993.

Digital Library

[43]

Q. Yang, G. Thangadurai, and L. Bhuyan. Design of adaptive cache coherence protocol for large scale multiprocessors. IEEE Transactions on Parallel and Distributed Systems, pages 281--293, May 1992.

Digital Library

[44]

K. C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28--40, Apr. 1996.

Digital Library

Cited By

Titos-Gil JAcacio M(2015)Hardware Approaches to Transactional Memory in Chip MultiprocessorsHandbook on Data Centers10.1007/978-1-4939-2092-1_27(805-835)Online publication date: 17-Mar-2015
https://doi.org/10.1007/978-1-4939-2092-1_27
Nagarajan VSorin DHill MWood DNagarajan VSorin DHill MWood D(2022)Advanced Topics in CoherenceA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_9(191-209)Online publication date: 28-Mar-2022
https://doi.org/10.1007/978-3-031-01764-3_9
Sorin DHill MWood DSorin DHill MWood D(2022)Advanced Topics in CoherenceA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01733-9_9(177-195)Online publication date: 18-Oct-2022
https://doi.org/10.1007/978-3-031-01733-9_9
Show More Cited By

Index Terms

Coherence decoupling: making use of incoherence

Recommendations

Coherence decoupling: making use of incoherence
ASPLOS 2004

This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a ...
Coherence decoupling: making use of incoherence
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems

This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a ...
Coherence decoupling: making use of incoherence
ASPLOS '04

This paper explores a new technique called coherence decoupling, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 38, Issue 5

ASPLOS '04

December 2004

283 pages

ISSN:0163-5980

DOI:10.1145/1037949

Issue’s Table of Contents

ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
October 2004
296 pages
ISBN:1581138040
DOI:10.1145/1024393
General Chair:
Shubu Mukherjee
Intel Corporation
,
Program Chair:
Kathryn S. McKinley
University of Texas at Austin

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004

Published in SIGOPS Volume 38, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
1,181
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Titos-Gil JAcacio M(2015)Hardware Approaches to Transactional Memory in Chip MultiprocessorsHandbook on Data Centers10.1007/978-1-4939-2092-1_27(805-835)Online publication date: 17-Mar-2015
https://doi.org/10.1007/978-1-4939-2092-1_27
Nagarajan VSorin DHill MWood DNagarajan VSorin DHill MWood D(2022)Advanced Topics in CoherenceA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_9(191-209)Online publication date: 28-Mar-2022
https://doi.org/10.1007/978-3-031-01764-3_9
Sorin DHill MWood DSorin DHill MWood D(2022)Advanced Topics in CoherenceA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01733-9_9(177-195)Online publication date: 18-Oct-2022
https://doi.org/10.1007/978-3-031-01733-9_9
Kao HSan Miguel JEnright Jerger N(2021)Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications50th International Conference on Parallel Processing Workshop10.1145/3458744.3474045(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3474045
Nagarajan VSorin DHill MWood D(2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
https://doi.org/10.2200/S00962ED2V01Y201910CAC049
AlBarakat LGratz PJiménez DAyguadé EHwu WBadia RHofstee H(2020)SB-FetchProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392735(1-12)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392735
Yao YChen WMitra TXiang Y(2017)TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.271967928:11(3313-3327)Online publication date: 1-Nov-2017
https://doi.org/10.1109/TPDS.2017.2719679
Agarwal NNellans DEbrahimi EWenisch TDanskin JKeckler S(2016)Selective GPU caches to eliminate CPU-GPU HW cache coherence2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446089(494-506)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446089
Park SPrvulovic MHughes C(2016)PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2016.7446072(285-296)Online publication date: Mar-2016
https://doi.org/10.1109/HPCA.2016.7446072
Huh JKim CShafi HZhang LBurger DKeckler S(2014)A NUCA substrate for flexible CMP cache sharingACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667186(380-389)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2667186
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents