[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Exploring the limits of early register release: Exploiting compiler analysis

Published: 02 October 2009 Publication History

Abstract

Register pressure in modern superscalar processors can be reduced by releasing registers early and by copying their contents to cheap back-up storage. This article quantifies the potential benefits of register occupancy reduction and shows that existing hardware-based schemes typically achieve only a small fraction of this potential. This is because they are unable to accurately determine the last use of a register and must wait until the redefining instruction enters the pipeline. On the other hand, compilers have a global view of the program and, using simple dataflow analysis, can determine the last use. This article evaluates the extent to which compiler analysis can aid early releasing, explores the design space, and introduces commit and issue-based early releasing schemes, quantifying their benefits. Using simple compiler analysis and microarchitecture changes, we achieve 70% of the potential register file occupancy reduction. By adding more hardware support, we can increase this to 94%. Our schemes are compared to state-of-the-art approaches for varying register file sizes and are shown to outperform these existing techniques.

References

[1]
Abella, J. and González, A. 2003. On reducing register pressure and energy in multiple- banked register files. In Proceedings of the 21st International Conference on Computer Design (ICCD-21). IEEE, Los Alamitos, CA.
[2]
Appel, A. W. 2002. Modern Compiler Implementation in Java. Cambridge University Press, Cambridge, UK.
[3]
Balasubramonian, R., Dwarkadas, S., and Albonesi, D. H. 2001. Reducing the complexity of the register file in dynamic super-scalar processors. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34).ACM, New York.
[4]
Borch, E., Manne, S., Emer, J., and Tune, E. 2002. Loose loops sink chips. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). IEEE, Los Alamitos, CA.
[5]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
[6]
Burger, D. and Austin, T. 1997. The simple-scalar tool set, version 2.0. Tech. rep. TR1342, University of Wisconsin-Madison.
[7]
Butts, J. A. 2004. Optimizing inter-instruction value communication through degree of use prediction. Ph.D. thesis, University of Wisconsin-Madison.
[8]
Butts, J. A. and Sohi, G. S. 2004. Use-based register caching with decoupled indexing. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.
[9]
Canal, R. and González, A. 2001. Reducing the complexity of the issue logic. In Proceedings of the 15th International Conference on Super-Computing (ICS-15). ACM, New York.
[10]
Cruz, J.-L., González, A., Valero, M., and Topham, N. P. 2000. Multiple-banked register file architectures. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
[11]
Emer, J. 2001. Ev8: The post-ultimate alpha. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques (PACT'01). (Keynote.) ACM, New York.
[12]
Ergin, O., Balkan, D., Ghose, K., and Ponomarev, D. 2004. Register packing: Exploiting narrow-width operands for reducing register file pressure. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37). ACM, New York.
[13]
Ergin, O., Balkan, D., Ponomarev, D., and Ghose, K. 2004. Increasing processor performance through early register release. In Proceedings of the 22nd International Conference on Computer Design (ICCD-22). IEEE, Los Alamitos, CA.
[14]
Franklin, M. and Sohi, G. S. 1992. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors. In Proceedings of the 25th International Symposium on Microarchitecture (MICRO-25). ACM, New York.
[15]
González, A., González, J., and Valero, M. 1998. Virtual-physical registers. In Proceedings of the 4th International Symposium on High Performance Computer Architecture (HPCA-4). IEEE, Los Alamitos, CA.
[16]
Gunther, S. H., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Tech. J. Q1.
[17]
Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
[18]
Jones, T. M., O'Boyle, M. F., Abella, J., and González, A. 2005. Software directed issue queue power reduction. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11). IEEE, Los Alamitos, CA.
[19]
Jones, T. M., O'Boyle, M. F. P., Abella, J., González, A., and Ergin, O. 2005. Compiler directed early register release. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.
[20]
Kim, N. S., Flautner, K., Blaauw, D., and Mudge, T. 2004. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.
[21]
Kim, N. S. and Mudge, T. 2003. The microarchitecture of a low-power register file. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.
[22]
Lipasti, M. H., Mestan, B. R., and Gunadi, E. 2004. Physical register in lining. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.
[23]
Lo, J. L., Parekh, S. S., Eggers, S. J., Levy, H. M., and Tullsen, D. M. 1999. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Trans. Paral. Distrib. Syst. 10, 9.
[24]
Martin, M. M., Roth, A., and Fischer, C. N. 1997. Exploiting dead value information. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO-30). ACM, New York.
[25]
Martinez, J. F., Renau, J., Huang, M. C., Prvulovic, M., and Torrellas, J. 2002. Cherry: Check-pointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.
[26]
Monreal, T., Viñals, V., González, A., and Valero, M. 2002. Hardware schemes for early register release. In Proceedings of the International Conference on Parallel Processing (ICPP). IEEE, Los Alamitos, CA.
[27]
Moudgill, M., Pingali, K., and Vassiliadis, S. 1993. Register renaming and dynamic speculation: An alternative approach. In Proceedings of the 26th International Symposium on Microarchitecture (MICRO-26). ACM, New York.
[28]
Park, I., Powell, M. D., and Vijaykumar, T. N. 2002. Reducing register ports for higher speed and lower energy. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.
[29]
Savransky, G., Ronen, R., and González, A. 2004. Lazy retirement: A power aware register management mechanism. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
[30]
Smith, M. D. and Holloway, G. 2000. The Machine-SUIF documentation set. http://www.eecs. harvard.edu/machsuif/software/software.html.
[31]
Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. CACTI 4.0. Tech. rep. HPL-2006-86, HP Laboratories Palo Alto.
[32]
Tran, L., Nelson, N., Ngai, F., Dropsho, S., and Huang, M. 2004. Dynamically reducing pressure on the physical register file through simple register sharing. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, Los Alamitos, CA.
[33]
Tseng, J. H. and Asanović, K. 2003. Banked multiported register files for high-frequency super-scalar microprocessors. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA-30). ACM, New York.
[34]
Wallace, S. and Bagherzadeh, N. 1996. A scalable register file architecture for dynamically scheduled processors. In Proceedings of the 5th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.

Cited By

View all
  • (2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
  • (2016)cNV SRAM: CMOS Technology Compatible Non-Volatile SRAM Based Ultra-Low Leakage Energy Hybrid Memory SystemIEEE Transactions on Computers10.1109/TC.2014.237518765:4(1055-1067)Online publication date: 1-Apr-2016
  • (2016)URFA-Update based register file architecture with partial register write for energy efficiencyMicroprocessors & Microsystems10.1016/j.micpro.2016.07.01947:PB(445-453)Online publication date: 1-Nov-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 6, Issue 3
September 2009
114 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1582710
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2009
Accepted: 01 April 2009
Revised: 01 October 2008
Received: 01 June 2008
Published in TACO Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Low-power design
  2. compiler
  3. energy efficiency
  4. microarchitecture
  5. register file

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
  • (2016)cNV SRAM: CMOS Technology Compatible Non-Volatile SRAM Based Ultra-Low Leakage Energy Hybrid Memory SystemIEEE Transactions on Computers10.1109/TC.2014.237518765:4(1055-1067)Online publication date: 1-Apr-2016
  • (2016)URFA-Update based register file architecture with partial register write for energy efficiencyMicroprocessors & Microsystems10.1016/j.micpro.2016.07.01947:PB(445-453)Online publication date: 1-Nov-2016
  • (2016)A survey of techniques for designing and managing CPU register fileConcurrency and Computation: Practice and Experience10.1002/cpe.390629:4Online publication date: 13-Jul-2016
  • (2015)TM-RF: Aging-Aware Power-Efficient Register File Design for Modern MicroprocessorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233413623:7(1196-1209)Online publication date: Jul-2015
  • (2014)Application-Guided Power Gating Reducing Register File Static PowerIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2013.229370222:12(2513-2526)Online publication date: Dec-2014
  • (2014)Variation Aware Sleep Vector Selection in Dual ${\rm V}_{{{\rm t}}}$ Dynamic OR Circuits for Low Leakage Register File DesignIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2014.229828061:7(1970-1983)Online publication date: Jul-2014
  • (2012)AFRePProceedings of the International Conference on Computer-Aided Design10.1145/2429384.2429447(302-308)Online publication date: 5-Nov-2012
  • (2012)A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput ProcessorsACM Transactions on Computer Systems10.1145/2166879.216688230:2(1-38)Online publication date: 1-Apr-2012
  • (2011)Energy-efficient mechanisms for managing thread context in throughput processorsACM SIGARCH Computer Architecture News10.1145/2024723.200009339:3(235-246)Online publication date: 4-Jun-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media