[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Open access

A way-halting cache for low-energy high-performance systems

Published: 01 March 2005 Publication History

Abstract

Caches contribute to much of a microprocessor system's power and energy consumption. Numerous new cache architectures, such as phased, pseudo-set-associative, way predicting, reactive-associative, way-shutdown, way-concatenating, and highly-associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new cache architecture, called a way-halting cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting cache is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches, making our cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55% savings of memory-access related energy over a conventional four-way set-associative cache. We show that savings are greater than previous methods, and nearly twice that of highly associative caches, while imposing no performance overhead and only 2% cache area overhead.

References

[1]
Advanced Micro Devices. http://www.amd.com.
[2]
Albonesi, D. H. 2000. Selective cache ways: On-demand cache resource allocation. Journal of Instruction Level Parallelism.
[3]
Amrutur, B. and Horowitz, M. 1998. A replica technique for word line and sense control in low-power SRAM's. IEEE Journal of Solid-State Circuits 33, 8, 1208--1218.
[4]
Batson, B. and Vijaykumar, T. N. 2000. Reactive-associative caches. In International Conference on Parallel Architectures and Compilation Techniques.
[5]
Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. University of Wisconsin-Madison Computer Sciences Dept., Technical Report #1342.
[6]
Cadence. http://www.cadence.com.
[7]
Calder, B., Grunwall, D., and Emer, J. 1996. Predictive sequential associative cache. In International Symposium on High Performance Computer Architecture.
[8]
Ckelov, M. and Dubois, M. 1997. Virtual-address caches. Part 1: Problems and Solutions in Uniprocessors. IEEE Micro 17, 5, 64--71.
[9]
Edmondson, J. H. and Rubinfield, P. I. 1995. Internal organization of the Alpha 21164 a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal 7, 1, 119--135.
[10]
Efthymiou, A. and Garside, J. D. 2002. An adaptive serial-parallel CAM architecture for low-power cache blocks. In Proceedings of the International Symposium on Low Power Electronics and Design.
[11]
Furber, S. B., Efthymiou, A., Garside, J. D., Lloyd, D. W., Lewis, M. J. G., and Temple, S. 2001. Power management in the Amulet microprocessors. IEEE Design & Test of Computers 18, 2, 42--52.
[12]
Garside, J. D., Temple, S., and Mehra, R. 1996. The AMULET2e cache system. In 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems.
[13]
Hasegawa, A., Kawasaki, I., Yamada, K., Yoshioka, S., Kawasaki, S., and Biswas, P. 1995. SH3: High code density, low power. IEEE Micro, Dec.
[14]
Hennessy, J. L. and Patterson, D. A. 2002. Computer Architecture: A Quantitative Approach, 3rd ed., International Student Edition. Morgan Kaufman, San Mateo, CA.
[15]
Huang, M., Renau, J., Yoo, S. M., and Torrellas, J. 2001. L1 data cache decomposition for energy efficiency. In International Symposium on Low Power Electronics and Design.
[16]
IBM. http://www.ibm.com.
[17]
http://www.specbench.org/osg/cpu2000/.
[18]
Inoue, K., Ishihara, T., and Murakami, K. 1999. Way-predictive set-associative cache for high performance and low energy consumption. In International Symposium on Low Power Electronics and Design.
[19]
Juan, Lang, T., and Navarro, J. 1996. The difference-bit cache. In Proceedings of the 27th Annual International Symposium on Computer Architecture.
[20]
Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In International Symposium on Microarchitecture.
[21]
Liu, L. 1994. Cache design with partial address matching. In Proceedings of the 27th Annual International Symposium on Microarchitecture.
[22]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and poerformance flexibility. In International Symposium on Low Power Electronics and Design.
[23]
MIPS Technologies, Inc. http://www.mips.com.
[24]
Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Farell, A., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P., Madden, L., Murray, D., Pearce, M., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1996. A 160 MHz 32 b 0.5 W CMOS RISC microprocessor. In IEEE International Solid-State Circuits Conference.
[25]
The Mosis Service. http://www.mosis.org.
[26]
Petrov, P. and Orailoglu, A. 2001. Data cache energy minimizations through programmable tag size matching to the applications. In International Symposium on System Synthesis.
[27]
Powell, M., Agarwal, A., Vijaykumar, T. N., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In International Symposium on Microarchitecture.
[28]
Panwar, R. and Rennels, D. 1995. Reducing the frequency of tag compares for low power I-cache design. In SLPE, pp. 57--62.
[29]
Reinmann, G. and Jouppi, N. P. 1999. CACTI2.0: An Integrated Cache Timing and Power Model. COMPAQ Western Research Lab.
[30]
Santhanam, S., et al. 1998. A low-cost, 300-MHz, RISC CPU with attached media processor. IEEE Journal of Solid-State Circuits 33, 11.
[31]
Segars, S. 2000. Low power design techniques for microprocessors. In International Solid-State Circuits Conference Tutorial.
[32]
Taylor, G., Davis, P., and Farmwald, M. 1990. The TLB slice---A low-cost high-speed address translation mechanisms. In Proceedings of the 17th Annual International Symposium on Computer Architecture.
[33]
Witchel, E., Larsen, S., Ananian, C. S., and Asanovic, K. 2001. Direct addressed caches for reduced power consumption. In International Symposium on Microarchitecture.
[34]
Yang, J. and Gupta, R. 2002. Energy efficient frequent value data cache design. In International Symposium on Microarchitecture.
[35]
Zhang, C., Vahid, F., Yang, J., and Najjar, W. 2004. A way-halting cache for low-energy high performance systems. In International Symposium on Low Power Electronics and Design.
[36]
Zhang, C., Vahid, F., and Najjar, W. 2003. A highly-configurable cache architecture for embedded systems. In International Symposium on Computer Architecture.
[37]
Zhang, C., Vahid, F., and Najjar, W. 2005. A highly-configurable cache architecture for embedded systems. ACM Transactions on Embedded Computing Systems.
[38]
Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, in conjunction with International Symposium on Microarchitecture.

Cited By

View all
  • (2022)3RSeT: Read Disturbance Rate Reduction in STT-MRAM Caches by Selective Tag ComparisonIEEE Transactions on Computers10.1109/TC.2021.308200471:6(1305-1319)Online publication date: 1-Jun-2022
  • (2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
  • (2021)Energy-Efficient Shared Cache Using Way Prediction Based on Way Access Dominance DetectionIEEE Access10.1109/ACCESS.2021.31267399(155048-155057)Online publication date: 2021
  • Show More Cited By
  1. A way-halting cache for low-energy high-performance systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 2, Issue 1
    March 2005
    108 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/1061267
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2005
    Published in TACO Volume 2, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cache
    2. dynamic optimization
    3. embedded systems
    4. low energy
    5. low power

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)3RSeT: Read Disturbance Rate Reduction in STT-MRAM Caches by Selective Tag ComparisonIEEE Transactions on Computers10.1109/TC.2021.308200471:6(1305-1319)Online publication date: 1-Jun-2022
    • (2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
    • (2021)Energy-Efficient Shared Cache Using Way Prediction Based on Way Access Dominance DetectionIEEE Access10.1109/ACCESS.2021.31267399(155048-155057)Online publication date: 2021
    • (2020)Energy-Aware Cross-Layer Offloading in Fog-RANs Using Network Coded Device CooperationIEEE Access10.1109/ACCESS.2020.30152648(169930-169943)Online publication date: 2020
    • (2019)Filter caching for freeProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322269(436-448)Online publication date: 22-Jun-2019
    • (2019)Segmented Tag Cache: A Novel Cache Organization for Reducing Dynamic Read EnergyIEEE Transactions on Computers10.1109/TC.2019.2906872(1-1)Online publication date: 2019
    • (2019)Improving Energy Efficiency by Memoizing Data Access Information2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)10.1109/ISLPED.2019.8824951(1-6)Online publication date: Jul-2019
    • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyACM SIGPLAN Notices10.1145/3299710.321134053:6(65-75)Online publication date: 19-Jun-2018
    • (2018)Decoupling address generation from loads and stores to improve data access energy efficiencyProceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3211332.3211340(65-75)Online publication date: 19-Jun-2018
    • (2018)Domino CacheACM Transactions on Design Automation of Electronic Systems10.1145/317484823:3(1-23)Online publication date: 1-Feb-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media