[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2851613.2851674acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Static energy efficient cache reconfiguration for dynamic NUCA in tiled CMPs

Published: 04 April 2016 Publication History

Abstract

Rapid growth in semiconductor technology permits to integrate multiple number of processor cores with multi-level on-chip caches. Integration of more on-chip components increases the on-chip power density. As per recent studies, on-chip caches are the principal contributors to the total power consumed by the chip. This cache power consumption can be divided into two major parts: dynamic power and static power. Dynamic power is consumed during cache accesses and static power is referred to as the leakage power of the cache. This increased power consumption increases effective chip-temperature which in turn increases the leakage power.
In this paper we attempt to reduce the static power consumption by powering off cache ways from the cache banks of a Tiled DNUCA cache. We use a bank utilisation based criteria for the way shutdown decision. The number of ways to be turned off from a bank is chosen based on bank's usage statistics. The contents of the powered off cache ways are written back to main memory. Thus, depending upon the application's working set size and data distribution, a controlled number of ways from a set of banks can be dynamically shutdown to save leakage power dissipation. For a 4MB 8 way L2 Tiled DNUCA cache, experimental analysis shows 17% reduction in EDP and 33% reduction in the static power. The powered-off ways are also aligned, simplifying the gating circuitry.

References

[1]
R. Balasubramonian, N. P. Jouppi, and N. Muralimanohar, Multi-core Cache Hierarchies. Morgan and Claypool Publishers, 2011.
[2]
W. Zang and A. Gordon-Ross, "A survey on cache tuning from a power/energy perspective," ACM Computing Surveys, vol. 45, no. 3, June 2013.
[3]
S. Mittal, "A survey of architectural techniques for improving cache power efficiency," Sustainable Computing: Informatics and Systems, vol. 4, no. 1, pp. 33--43, 2014.
[4]
N. S. Kim, T. M. Austin, D. Blaauw, T. N. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. T. Kandemir, and N. Vijaykrishnan, "Leakage current: Moore's law meets static power," IEEE Computer, vol. 36, no. 12, pp. 68--75, 2003.
[5]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO-42, 2009, pp. 469--480.
[6]
S. Das and H. K. Kapoor, "Exploration of migration and replacement policies for dynamic nuca over tiled cmps," in 28th International Conference on VLSI Design, 2015, pp. 141--146.
[7]
M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar, "Gated-VDD: A circuit technique to reduce leakage in deep-submicron cache memories," ACM ISLPED, pp. 90--95, 2000.
[8]
A. Bardine, M. Comparetti, P. Foglia, G. Gabrielli, and C. A. Prete, "Way Adaptable D-NUCA Caches," Int. J. High Perform. Syst. Archit., vol. 2, no. 3/4, pp. 215--228, Aug. 2010.
[9]
P. Foglia and M. Comparetti, "A workload independent energy reduction strategy for D-NUCA caches," The Journal of Supercomputing, vol. 68, pp. 157--182, Oct 2013.
[10]
X. Wang, K. Ma, and Y. Wang, "Cache latency control for application fairness of differentiation in power-constrained chip multiprocessors," IEEE Transactions on Computers, vol. 61, no. 10, pp. 1371--1385, October 2012.
[11]
S. Ramaswamy and S. Yalamanchili, "Improving cache efficiency via resizing + remapping," in International Conference on Computer Design, Oct 2007, pp. 47--54.
[12]
A. M. Dani, B. Amrutur, and Y. N. Srikant, "Adaptive power optimization of on-chip snuca cache on tiled chip multicore architecture using remap policy," Second Workshop on Architecture and Multi-Core Applications, pp. 12--17, 2011.
[13]
H. K. Kapoor, S. Das, and S. Chakraborty, "Static energy reduction by performance linked cache capacity management in tiled cmps," in 30th ACM/SIGAPP SAC, Salamanca, Spain, 2015, pp. 1913--1918.
[14]
S. Chakraborty, S. Das, and H. K. Kapoor, "Performance constrained static energy reduction using way-sharing target-banks," in IPDPSW, IEEE, May 2015, pp. 444--453.
[15]
H. Zhou, M. C. Toburen, E. Rotenberg, and T. M. Conte, "Adaptive mode control: A static power efficient cache design," ACM Transactions on Embedded Computing Systems, vol. 2, no. 3, pp. 347--372, August 2003.
[16]
B. Fitzgerald, S. Lopez, and J. Sahuquillo, "Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy," IGCC, pp. 1--6, June 2013.
[17]
A. Bardine, P. Foglia, G. Gabrielli, and C. A. Prete, "Analysis of static and dynamic energy consumption in nuca caches: Initial results," ACM MEDEA, pp. 105--112, September 2007.
[18]
"Micron 1 GB DDR2 SDRAM Module Datasheet," http://www.micron.com, {Online}.
[19]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, "Multifacet general execution-driven multiprocessor simulator (GEMS) toolset," ACM SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92--99, November 2005.
[20]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A full system simulation platform," IEEE Transactions on Computers, vol. 35, no. 2, pp. 50--58, February 2002.
[21]
N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha, "Garnet: A detailed on-chip network model inside a full-system simulator," in IEEE International Symposium on Performance Analysis of Systems and Software., April 2009, pp. 33--42.
[22]
H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik, "Orion: a power-performance simulator for interconnection networks," in MICRO-35, 2002, pp. 294--305.
[23]
N. Muralimanohar, R. BalaSubramonian, and N. P. Jouppi, "Cacti 6.0: A tool to understand large caches," March 2008.
[24]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC benchmark suite: Characterization and architectural implications," Princeton University Technical Report TR-811-08, January 2008.

Cited By

View all
  • (2022)ACCURATE: Accuracy Maximization for Real-Time Multicore Systems With Energy-Efficient Way-Sharing CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.316140741:12(5246-5260)Online publication date: Dec-2022
  • (2022)Process variation aware DRAM-Cache resizingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102364123:COnline publication date: 1-Feb-2022

Index Terms

  1. Static energy efficient cache reconfiguration for dynamic NUCA in tiled CMPs

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing
      April 2016
      2360 pages
      ISBN:9781450337397
      DOI:10.1145/2851613
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. DNUCA
      2. EDP
      3. cache power consumption
      4. leakage power
      5. power gating
      6. tiled CMP
      7. way shutdown

      Qualifiers

      • Research-article

      Conference

      SAC 2016
      Sponsor:
      SAC 2016: Symposium on Applied Computing
      April 4 - 8, 2016
      Pisa, Italy

      Acceptance Rates

      SAC '16 Paper Acceptance Rate 252 of 1,047 submissions, 24%;
      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)ACCURATE: Accuracy Maximization for Real-Time Multicore Systems With Energy-Efficient Way-Sharing CachesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.316140741:12(5246-5260)Online publication date: Dec-2022
      • (2022)Process variation aware DRAM-Cache resizingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102364123:COnline publication date: 1-Feb-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media