[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2485922.2485969acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Protozoa: adaptive granularity cache coherence

Published: 23 June 2013 Publication History

Abstract

State-of-the-art multiprocessor cache hierarchies propagate the use of a fixed granularity in the cache organization to the design of the coherence protocol. Unfortunately, the fixed granularity, generally chosen to match average spatial locality across a range of applications, not only results in wasted bandwidth to serve an individual thread's access needs, but also results in unnecessary coherence traffic for shared data. The additional bandwidth has a direct impact on both the scalability of parallel applications and overall energy consumption.
In this paper, we present the design of Protozoa, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality. Protozoa continues to maintain metadata at a conventional fixed cache line granularity while 1) supporting variable read and write caching granularity so that data transfer matches application spatial granularity, 2) invalidating at the granularity of the write miss request so that readers to disjoint data can co-exist with writers, and 3) potentially supporting multiple non-overlapping writers within the cache line, thereby avoiding the traditional ping-pong effect of both read-write and write-write false sharing. Our evaluation demonstrates that Protozoa consistently reduce miss rate and improve the fraction of transmitted data that is actually utilized.

References

[1]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, M. D. Hill, D. A. Wood, and D. J. Sorin. Simulating a $2m commercial server on a $2k pc. Computer, 36(2):50--57, 2003.
[2]
D. Albonesi, A. Kodi, and V. Stojanovic. NSF Workshop on Emerging Technologies for Interconnects (WETI), 2012.
[3]
C. Bienia. Benchmarking Modern Multiprocessors. In Ph.D. Thesis. Princeton University, 2011.
[4]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. of the 21st OOPSLA, 2006.
[5]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T Chou. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism. In Proc. of the 20th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), Oct. 2011.
[6]
P. Conway and B. Hughes. The AMD Opteron Northbridge Architecture. In IEEE Micro. IEEE Computer Society Press, 2007.
[7]
C. Dubnicki and T. J. Leblanc. Adjustable Block Size Coherent Caches. In Proc. of the 19th Annual Intl. Symp. on Computer Architecture (ISCA), 1992.
[8]
A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In Proc. of the ACM Intl. Conf. on Supercomputing, 1995.
[9]
M. Kadiyala and L. N. Bhuyan. A dynamic cache sub-block design to reduce false sharing. In Proc. of the 1995 Intl. Conf. on Computer Design: VLSI in Computers and Processors, 1995.
[10]
R. Kalla, B. Sinharoy, W. J. Starke, and M. FloydPower7: IBM's Next-Generation Server Processor. In IEEE Micro Journal, 2010.
[11]
J. H. Keim, D. R. Johnson, W Tuohy, S. S. Lumetta, and S. J. Patel. Cohesion: a hybrid memory model for accelerators. In Proc. of the 37th Intl. Symp. on Computer Architecture (ISCA), 2010.
[12]
S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. Amoeba Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In Proc. of the 45th Intl. Symp. on Microarchitecture (MICRO), 2012.
[13]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proc. of the 2005 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2005.
[14]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002.
[15]
M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, pages 78--89, 2012.
[16]
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In Proc. of the 30th Intl. Symp. on Computer Architecture (ISCA). 2003.
[17]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. In ACM SIGARCH Computer Architecture News, Sept. 2005.
[18]
D. Park, R. H. Saavedra, and S. Moon. Adaptive Granularity: Transparent Integration of Fine- and Coarse-Grain Communication. In Proc. of the 1996 Conf. on Parallel Architectures and Compilation Techniques (PACT), 1996.
[19]
S. H. Pugsley, J. B. Spjut, D. W. Nellans, and R. Balasubramonian. SWEL: hardware cache coherence protocols to map shared data onto shared caches. In 19th Intl. Conf. on Parallel Architecture and Compilation Techniques (PACT), 2010.
[20]
P. Pujara and A. Aggarwal. Increasing the Cache Efficiency by Eliminating Noise. In Proc. of the 12th Intl. Symp. on High Performance Computer Architecture (HPCA), 2006.
[21]
M. K. Qureshi, M. A. Suleman, and Y. N. Patt. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines. In Proc. of the 13th Intl. Symp. on High Performance Computer Architecture (HPCA), 2007.
[22]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. of the 2007 IEEE 13th Intl. Symp. on High Performance Computer Architecture (HPCA), 2007.
[23]
J. B. Rothman and A. J. Smith. The pool of subsectors cache design. In Proc. of the 13th ACM Intl. Conf. on Supercomputing, 1999.
[24]
J. B. Rothman and A. J. Smith. Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance. In Proc. of the 4th Intl. Symp. on High Performance Computing, 2002.
[25]
B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson. Programming model for a heterogeneous x86 platform. In Proc. of the 2009 Conf. on Programming Language Design and Implementation (PLDI), 2009.
[26]
D. J. Scales, K. Gharachorloo, and A. Aggarwal. Fine-grain software distributed shared memory on smp clusters. In Proc. of the 4th Intl. Symp. on High-Performance Computer Architecture (HPCA), pages 125--136, Feb. 1998.
[27]
D. J. Scales, K. Gharachorloo, and C. Thekkath. Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. In Proc. of the 7th Symp. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 174--185, Oct. 1996.
[28]
A. Seznec. Decoupled sectored caches: conciliating low tag implementation cost. In Proc. of the 21st Intl. Symp. on Computer Architecture (ISCA), 1994.
[29]
D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. In Synthesis Lectures in Computer Architecture, Morgan Claypool Publishers, 2011.
[30]
J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix++: modular mapreduce for shared-memory systems. In Proc. of the second international workshop on MapReduce and its applications, 2011.
[31]
E. Totoni, B. Behzad, S. Ghike, and J. Torrellas. Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs. In IEEE Intl. Symposium on Performance Analysis of Systems & Software (ISPASS), 2012.
[32]
D. Vantrease, M. Lipasti, and N. Binkert. Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. In Proc. of the 17th Intl. Symp. on High Performance Computer Architecture (HPCA), 2011.
[33]
A. V. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji. Adapting cache line size to application behavior. In Proc. of the 13th ACM Intl. Conf. on Supercomputing (ICS). 1999.
[34]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proc. of the 22nd annual Intl. Symp. on Computer architecture (ISCA), 1995.
[35]
J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In Proc. of the 42nd Intl. Symp. on Microarchitecture (MICRO), 2009.
[36]
H. Zhao, A. Shriraman, and S. Dwarkadas. SPACE: Sharing Pattern-based Directory Coherence for Multicore Scalability. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), Oct. 2010.
[37]
H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan. SPATL: Honey, I Shrunk the Coherence Directory. In Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), Oct. 2011.
[38]
Y. Zhou, L. Iftode, J. P. Singh, K. Li, B. R. Toonen, I. Schoinas, M. D. Hill, and D. A. Wood. Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. In Proc. of the 6 th ACM Symp. on Principles and Practice of Parallel Programming (PPoPP), June 1997.

Cited By

View all
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2020)RSMCC: Enabling Ring-based Software Managed Cache-Coherent Embedded SoCs2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP50117.2020.00026(131-135)Online publication date: Mar-2020
  • (2019)Designing Far Memory Data StructuresProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321433(120-126)Online publication date: 13-May-2019
  • Show More Cited By
  1. Protozoa: adaptive granularity cache coherence

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
    June 2013
    686 pages
    ISBN:9781450320795
    DOI:10.1145/2485922
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
      ICSA '13
      June 2013
      666 pages
      ISSN:0163-5964
      DOI:10.1145/2508148
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    • IEEE CS

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISCA'13
    Sponsor:

    Acceptance Rates

    ISCA '13 Paper Acceptance Rate 56 of 288 submissions, 19%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
    • (2020)RSMCC: Enabling Ring-based Software Managed Cache-Coherent Embedded SoCs2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP50117.2020.00026(131-135)Online publication date: Mar-2020
    • (2019)Designing Far Memory Data StructuresProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321433(120-126)Online publication date: 13-May-2019
    • (2018)Dynamic fine-grained sparse memory accessesProceedings of the International Symposium on Memory Systems10.1145/3240302.3240416(85-97)Online publication date: 1-Oct-2018
    • (2018)SpandexProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00031(261-274)Online publication date: 2-Jun-2018
    • (2017)Detecting Software Cache Coherence Violations in MPSoC Using Traces Captured on Virtual PlatformsACM Transactions on Embedded Computing Systems10.1145/299019316:2(1-21)Online publication date: 2-Jan-2017
    • (2017)Exploring grouped coherence for clustered hierarchical cacheThe Journal of Supercomputing10.1007/s11227-017-2024-873:9(4137-4157)Online publication date: 1-Sep-2017
    • (2016)Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753269(117-124)Online publication date: Oct-2016
    • (2015)Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systemsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830774(13-25)Online publication date: 5-Dec-2015
    • (2015)The Effects of Granularity and Adaptivity on Private/Shared Classification for CoherenceACM Transactions on Architecture and Code Optimization10.1145/279030112:3(1-21)Online publication date: 31-Aug-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media