More Web Proxy on the site http://driver.im/

Article

Reducing set-associative cache energy via way-prediction and selective direct-mapping

Authors:

Michael D. Powell,

T. N. Vijaykumar,

Kaushik RoyAuthors Info & Claims

MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture

Pages 54 - 65

Published: 01 December 2001 Publication History

Publisher Site Get Access

Abstract

Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted. Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy.We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive, 1-cycle, 4-way, parallel access cache.

References

[1]

S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta. Predictability of load/store instruction latencies. In Proceedings of the 26th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 26), pages 139-152, Dec. 1993.

Digital Library

[2]

D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 32), pages 248-259, Nov. 1999.

Digital Library

[3]

T. M. Austin and G. Sohi. Zero-cycle loads: Microarchitecture support for reducing load latency. In Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 28), Dec. 1995.

Digital Library

[4]

B. Batson and T. N. Vijaykumar. Reactive associative caches. In Proceedings of the 2001 International Conference on Parallel Architectures and Compiliation, Sept. 2001.

Digital Library

[5]

N. Bellas, I. Hajj, and C. Polychronopoulos. Using dynamic management techniques to reduce energy in high-performance processors. In Proceedings of the 1999 International Symposium on Low Power Electronics and Design (ISLPED), pages 64-69, Aug. 1999.

Digital Library

[6]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83-94, June 2000.

Digital Library

[7]

J. Bunda, W. Athas, and D. Fussell. Evaluating power implications of CMOS microprocessor design decisions. In Proceedings of the 1994 International Symposium on Low Power Electronics and Design (ISLPED), pages 147-152, Apr. 1994.

[8]

D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin-Madison, June 1997.

Digital Library

[9]

B. Calder and D. Grunwald. Next cache line and set prediction. In Proceedings of the International Symposium on Computer Architecture, pages 287-296, Nov. 1995.

Digital Library

[10]

B. Calder, D. Grunwald, and J. Emer. Predictive sequential associative cache. In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.

Digital Library

[11]

J. H. Edmondson and et al. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1), 1995.

Digital Library

[12]

M. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the alpha 21264 microprocessor. In 35th Design Automation Conference, 1998.

Digital Library

[13]

K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In Proceedings of the International Symposium on Low Power Electronics and Design, pages 273-275, Aug. 1999.

Digital Library

[14]

S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: Exploiting generational behavior to reduce leakage power. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), July 2001.

Digital Library

[15]

J. Kin, M. Gupta, and W. H. Mangione-Smith. The filter cache: An energy efficient memory structure. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 30), pages 184-193, Dec. 1997.

Digital Library

[16]

S. Manne, A. Klauser, and D. Grunwald. Pipline gating: Speculation control for energy reduction. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 132-141, June 1998.

Digital Library

[17]

C.-L. Su and A. M. Despain. Cache design trade-offs for power and performance optimization: A case study. In Proceedings of the 1995 International Symposium on Low Power Electronics and Design (ISLPED), pages 63-68, 1995.

Digital Library

[18]

S. J. E. Wilson and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.

[19]

S. H. Yang, M. D. Powell, B. Falsafi, K. Roy, and T. N. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In Seventh International Symposium on High Performance Computer Architecture (HPCA), Jan. 2001.

Digital Library

Cited By

Baoni VMittal ASohi G(2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480104
Alves RRos ABlack-Schaffer DKaxiras SManne SHunter HAltman E(2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322269
Stokes MBaird RJin ZWhalley DOnder S(2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
https://dl.acm.org/doi/10.1145/3299710.3211340
Show More Cited By

Reducing set-associative cache energy via way-prediction and selective direct-mapping
1. General and reference
  1. Cross-computing tools and techniques
2. Hardware

Recommendations

Using a way cache to improve performance of set-associative caches
ISHPC'05/ALPS'06: Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems

Modern high-performance out-of-order processors use L1 caches with increasing degree of associativity to improve performance. Higher associativity is not always feasible for two reasons: it increases cache hit latency and energy consumption. One of the ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture

December 2001

355 pages

ISBN:0769513697

General Chair:
Yale Patt
The University of Texas at Austin
,
Program Chairs:
Josh Fisher
Hewlett-Packard Laboratories
,
Paolo Faraboschi
Hewlett-Packard Laboratories
,
Publications Chair:
Kevin Skadron
University of Virginia

Copyright © Copyright (c) 2001 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2001

Check for updates

Qualifiers

Article

Conference

MICRO-34

Sponsor:

SIGMICRO

MICRO-34: The 34th International Symposium on Microarchitecture

December 1 - 5, 2001

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,270
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Baoni VMittal ASohi G(2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480104
Alves RRos ABlack-Schaffer DKaxiras SManne SHunter HAltman E(2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322269
Stokes MBaird RJin ZWhalley DOnder S(2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
https://dl.acm.org/doi/10.1145/3299710.3211340
Stokes MBaird RJin ZWhalley DOnder SZhang ZDubach C(2018)Decoupling address generation from loads and stores to improve data access energy efficiencyProceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3211332.3211340(65-75)Online publication date: 19-Jun-2018
https://dl.acm.org/doi/10.1145/3211332.3211340
Young VChou CJaleel AQureshi M(2018)ACCORDProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00036(328-339)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00036
Chen JVenkataramani G(2016)enDebugJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.05.00596:C(121-133)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.05.005
Bardizbanyan ASjälander MWhalley DLarsson-Edefors P(2015)Improving Data Access Efficiency by Using Context-Aware Loads and StoresACM SIGPLAN Notices10.1145/2808704.275496050:5(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2808704.2754960
Bardizbanyan ASjälander MWhalley DLarsson-Edefors PNoh SFischmeister SXue J(2015)Improving Data Access Efficiency by Using Context-Aware Loads and StoresProceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM10.1145/2670529.2754960(1-10)Online publication date: 4-Jun-2015
https://dl.acm.org/doi/10.1145/2670529.2754960
Sembrant AHagersten EBlack-Schaffer DYew PZhai AKeckler S(2014)The Direct-to-Data (D2D) cacheProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665694(133-144)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665694
Bardizbanyan ASjälander MWhalley DLarsson-Edefors PFettweis GNebel W(2014)Reducing set-associative L1 data cache energy by early load data dependence detection (ELD3)Proceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616707(1-4)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616707
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents