[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/563998.564007acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Reducing set-associative cache energy via way-prediction and selective direct-mapping

Published: 01 December 2001 Publication History

Abstract

Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted. Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy.We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive, 1-cycle, 4-way, parallel access cache.

References

[1]
S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 26), pages 139-152, Dec. 1993.
[2]
D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 32), pages 248-259, Nov. 1999.
[3]
T. M. Austin and G. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 28), Dec. 1995.
[4]
B. Batson and T. N. Vijaykumar. Reactive associative caches. In Proceedings of the 2001 International Conference on Parallel Architectures and Compiliation, Sept. 2001.
[5]
N. Bellas, I. Hajj, and C. Polychronopoulos. Using dynamic management techniques to reduce energy in high-performance processors. In Proceedings of the 1999 International Symposium on Low Power Electronics and Design (ISLPED), pages 64-69, Aug. 1999.
[6]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83-94, June 2000.
[7]
J. Bunda, W. Athas, and D. Fussell. Evaluating power implications of CMOS microprocessor design decisions. In Proceedings of the 1994 International Symposium on Low Power Electronics and Design (ISLPED), pages 147-152, Apr. 1994.
[8]
D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin-Madison, June 1997.
[9]
B. Calder and D. Grunwald. Next cache line and set prediction. In Proceedings of the International Symposium on Computer Architecture, pages 287-296, Nov. 1995.
[10]
B. Calder, D. Grunwald, and J. Emer. Predictive sequential associative cache. In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
[11]
J. H. Edmondson and et al. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1), 1995.
[12]
M. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the alpha 21264 microprocessor. In 35th Design Automation Conference, 1998.
[13]
K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In Proceedings of the International Symposium on Low Power Electronics and Design, pages 273-275, Aug. 1999.
[14]
S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: Exploiting generational behavior to reduce leakage power. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), July 2001.
[15]
J. Kin, M. Gupta, and W. H. Mangione-Smith. The filter cache: An energy efficient memory structure. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 30), pages 184-193, Dec. 1997.
[16]
S. Manne, A. Klauser, and D. Grunwald. Pipline gating: Speculation control for energy reduction. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 132-141, June 1998.
[17]
C.-L. Su and A. M. Despain. Cache design trade-offs for power and performance optimization: A case study. In Proceedings of the 1995 International Symposium on Low Power Electronics and Design (ISLPED), pages 63-68, 1995.
[18]
S. J. E. Wilson and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.
[19]
S. H. Yang, M. D. Powell, B. Falsafi, K. Roy, and T. N. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In Seventh International Symposium on High Performance Computer Architecture (HPCA), Jan. 2001.

Cited By

View all
  • (2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
  • (2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
  • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
  • Show More Cited By
  1. Reducing set-associative cache energy via way-prediction and selective direct-mapping

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
      December 2001
      355 pages
      ISBN:0769513697

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 December 2001

      Check for updates

      Qualifiers

      • Article

      Conference

      MICRO-34
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
      • (2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
      • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
      • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyProceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3211332.3211340(65-75)Online publication date: 19-Jun-2018
      • (2018)ACCORDProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00036(328-339)Online publication date: 2-Jun-2018
      • (2016)enDebugJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.00596:C(121-133)Online publication date: 1-Oct-2016
      • (2015)Improving Data Access Efficiency by Using Context-Aware Loads and StoresACM SIGPLAN Notices10.1145/2808704.275496050:5(1-10)Online publication date: 4-Jun-2015
      • (2015)Improving Data Access Efficiency by Using Context-Aware Loads and StoresProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754960(1-10)Online publication date: 4-Jun-2015
      • (2014)The Direct-to-Data (D2D) cacheProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665694(133-144)Online publication date: 14-Jun-2014
      • (2014)Reducing set-associative L1 data cache energy by early load data dependence detection (ELD3)Proceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616707(1-4)Online publication date: 24-Mar-2014
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media