More Web Proxy on the site http://driver.im/

research-article

Open access

Exploring the limits of early register release: Exploiting compiler analysis

Authors:

Timothy M. Jones,

Michael F. P. O'Boyle,

Antonio González,

Oğuz ErginAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 6, Issue 3

Article No.: 12, Pages 1 - 30

https://doi.org/10.1145/1582710.1582714

Published: 02 October 2009 Publication History

Abstract

Register pressure in modern superscalar processors can be reduced by releasing registers early and by copying their contents to cheap back-up storage. This article quantifies the potential benefits of register occupancy reduction and shows that existing hardware-based schemes typically achieve only a small fraction of this potential. This is because they are unable to accurately determine the last use of a register and must wait until the redefining instruction enters the pipeline. On the other hand, compilers have a global view of the program and, using simple dataflow analysis, can determine the last use. This article evaluates the extent to which compiler analysis can aid early releasing, explores the design space, and introduces commit and issue-based early releasing schemes, quantifying their benefits. Using simple compiler analysis and microarchitecture changes, we achieve 70% of the potential register file occupancy reduction. By adding more hardware support, we can increase this to 94%. Our schemes are compared to state-of-the-art approaches for varying register file sizes and are shown to outperform these existing techniques.

References

[1]

Abella, J. and González, A. 2003. On reducing register pressure and energy in multiple- banked register files. In Proceedings of the 21st International Conference on Computer Design (ICCD-21). IEEE, Los Alamitos, CA.

Digital Library

[2]

Appel, A. W. 2002. Modern Compiler Implementation in Java. Cambridge University Press, Cambridge, UK.

Digital Library

[3]

Balasubramonian, R., Dwarkadas, S., and Albonesi, D. H. 2001. Reducing the complexity of the register file in dynamic super-scalar processors. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34).ACM, New York.

Digital Library

[4]

Borch, E., Manne, S., Emer, J., and Tune, E. 2002. Loose loops sink chips. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). IEEE, Los Alamitos, CA.

Digital Library

[5]

Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.

Digital Library

[6]

Burger, D. and Austin, T. 1997. The simple-scalar tool set, version 2.0. Tech. rep. TR1342, University of Wisconsin-Madison.

[7]

Butts, J. A. 2004. Optimizing inter-instruction value communication through degree of use prediction. Ph.D. thesis, University of Wisconsin-Madison.

Digital Library

[8]

Butts, J. A. and Sohi, G. S. 2004. Use-based register caching with decoupled indexing. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.

Digital Library

[9]

Canal, R. and González, A. 2001. Reducing the complexity of the issue logic. In Proceedings of the 15th International Conference on Super-Computing (ICS-15). ACM, New York.

Digital Library

[10]

Cruz, J.-L., González, A., Valero, M., and Topham, N. P. 2000. Multiple-banked register file architectures. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.

Digital Library

[11]

Emer, J. 2001. Ev8: The post-ultimate alpha. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques (PACT'01). (Keynote.) ACM, New York.

[12]

Ergin, O., Balkan, D., Ghose, K., and Ponomarev, D. 2004. Register packing: Exploiting narrow-width operands for reducing register file pressure. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37). ACM, New York.

Digital Library

[13]

Ergin, O., Balkan, D., Ponomarev, D., and Ghose, K. 2004. Increasing processor performance through early register release. In Proceedings of the 22nd International Conference on Computer Design (ICCD-22). IEEE, Los Alamitos, CA.

Digital Library

[14]

Franklin, M. and Sohi, G. S. 1992. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors. In Proceedings of the 25th International Symposium on Microarchitecture (MICRO-25). ACM, New York.

Digital Library

[15]

González, A., González, J., and Valero, M. 1998. Virtual-physical registers. In Proceedings of the 4th International Symposium on High Performance Computer Architecture (HPCA-4). IEEE, Los Alamitos, CA.

Digital Library

[16]

Gunther, S. H., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Tech. J. Q1.

[17]

Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.

[18]

Jones, T. M., O'Boyle, M. F., Abella, J., and González, A. 2005. Software directed issue queue power reduction. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11). IEEE, Los Alamitos, CA.

Digital Library

[19]

Jones, T. M., O'Boyle, M. F. P., Abella, J., González, A., and Ergin, O. 2005. Compiler directed early register release. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.

Digital Library

[20]

Kim, N. S., Flautner, K., Blaauw, D., and Mudge, T. 2004. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.

Digital Library

[21]

Kim, N. S. and Mudge, T. 2003. The microarchitecture of a low-power register file. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.

Digital Library

[22]

Lipasti, M. H., Mestan, B. R., and Gunadi, E. 2004. Physical register in lining. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.

Digital Library

[23]

Lo, J. L., Parekh, S. S., Eggers, S. J., Levy, H. M., and Tullsen, D. M. 1999. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Trans. Paral. Distrib. Syst. 10, 9.

Digital Library

[24]

Martin, M. M., Roth, A., and Fischer, C. N. 1997. Exploiting dead value information. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO-30). ACM, New York.

Digital Library

[25]

Martinez, J. F., Renau, J., Huang, M. C., Prvulovic, M., and Torrellas, J. 2002. Cherry: Check-pointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.

Digital Library

[26]

Monreal, T., Viñals, V., González, A., and Valero, M. 2002. Hardware schemes for early register release. In Proceedings of the International Conference on Parallel Processing (ICPP). IEEE, Los Alamitos, CA.

Digital Library

[27]

Moudgill, M., Pingali, K., and Vassiliadis, S. 1993. Register renaming and dynamic speculation: An alternative approach. In Proceedings of the 26th International Symposium on Microarchitecture (MICRO-26). ACM, New York.

Digital Library

[28]

Park, I., Powell, M. D., and Vijaykumar, T. N. 2002. Reducing register ports for higher speed and lower energy. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.

Digital Library

[29]

Savransky, G., Ronen, R., and González, A. 2004. Lazy retirement: A power aware register management mechanism. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.

[30]

Smith, M. D. and Holloway, G. 2000. The Machine-SUIF documentation set. http://www.eecs. harvard.edu/machsuif/software/software.html.

[31]

Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. CACTI 4.0. Tech. rep. HPL-2006-86, HP Laboratories Palo Alto.

[32]

Tran, L., Nelson, N., Ngai, F., Dropsho, S., and Huang, M. 2004. Dynamically reducing pressure on the physical register file through simple register sharing. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, Los Alamitos, CA.

Digital Library

[33]

Tseng, J. H. and Asanović, K. 2003. Banked multiported register files for high-frequency super-scalar microprocessors. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA-30). ACM, New York.

Digital Library

[34]

Wallace, S. and Bagherzadeh, N. 1996. A scalable register file architecture for dynamically scheduled processors. In Proceedings of the 5th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.

Digital Library

Cited By

Sadrosadati MMirhosseini AHajiabadi AEhsani SFalahati HSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu O(2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3419973
Wang JWang LYin HWei ZYang ZGong N(2016)cNV SRAM: CMOS Technology Compatible Non-Volatile SRAM Based Ultra-Low Leakage Energy Hybrid Memory SystemIEEE Transactions on Computers10.1109/TC.2014.237518765:4(1055-1067)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1109/TC.2014.2375187
Eker AMert YErgin O(2016)URFA-Update based register file architecture with partial register write for energy efficiencyMicroprocessors & Microsystems10.1016/j.micpro.2016.07.01947:PB(445-453)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.07.019
Show More Cited By

Index Terms

Exploring the limits of early register release: Exploiting compiler analysis

Recommendations

Energy-efficient register caching with compiler assistance

The register file is a critical component in a modern superscalar processor. It must be large enough to accommodate the results of all in-flight instructions. It must also have enough ports to allow simultaneous issue and writeback of many values each ...
CORF: Coalescing Operand Register File for GPUs
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

The Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects the overall performance and the energy efficiency of a GPU. For ...
Speculative early register release
CF '06: Proceedings of the 3rd conference on Computing frontiers

The late release policy of conventional renaming keeps many registers in the register file assigned in spite of containing values that will never be read in the future. In this work, we study the potential of a novel scheme that speculatively releases a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 6, Issue 3

September 2009

114 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/1582710

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2009

Accepted: 01 April 2009

Revised: 01 October 2008

Received: 01 June 2008

Published in TACO Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
471
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)7

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sadrosadati MMirhosseini AHajiabadi AEhsani SFalahati HSarbazi-Azad HDrumond MFalsafi BAusavarungnirun RMutlu O(2021)Highly Concurrent Latency-tolerant Register Files for GPUsACM Transactions on Computer Systems10.1145/341997337:1-4(1-36)Online publication date: 4-Jan-2021
https://dl.acm.org/doi/10.1145/3419973
Wang JWang LYin HWei ZYang ZGong N(2016)cNV SRAM: CMOS Technology Compatible Non-Volatile SRAM Based Ultra-Low Leakage Energy Hybrid Memory SystemIEEE Transactions on Computers10.1109/TC.2014.237518765:4(1055-1067)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1109/TC.2014.2375187
Eker AMert YErgin O(2016)URFA-Update based register file architecture with partial register write for energy efficiencyMicroprocessors & Microsystems10.1016/j.micpro.2016.07.01947:PB(445-453)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.07.019
Mittal S(2016)A survey of techniques for designing and managing CPU register fileConcurrency and Computation: Practice and Experience10.1002/cpe.390629:4Online publication date: 13-Jul-2016
https://doi.org/10.1002/cpe.3906
Gong NWang JJiang SSridhar R(2015)TM-RF: Aging-Aware Power-Efficient Register File Design for Modern MicroprocessorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.233413623:7(1196-1209)Online publication date: Jul-2015
https://doi.org/10.1109/TVLSI.2014.2334136
Tabkhi HSchirner G(2014)Application-Guided Power Gating Reducing Register File Static PowerIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2013.229370222:12(2513-2526)Online publication date: Dec-2014
https://doi.org/10.1109/TVLSI.2013.2293702
Gong NWang JSridhar R(2014)Variation Aware Sleep Vector Selection in Dual ${\rm V}_{{{\rm t}}}$ Dynamic OR Circuits for Low Leakage Register File DesignIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2014.229828061:7(1970-1983)Online publication date: Jul-2014
https://doi.org/10.1109/TCSI.2014.2298280
Tabkhi HSchirner GHu A(2012)AFRePProceedings of the International Conference on Computer-Aided Design10.1145/2429384.2429447(302-308)Online publication date: 5-Nov-2012
https://dl.acm.org/doi/10.1145/2429384.2429447
Gebhart MJohnson DTarjan DKeckler SDally WLindholm ESkadron K(2012)A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput ProcessorsACM Transactions on Computer Systems10.1145/2166879.216688230:2(1-38)Online publication date: 1-Apr-2012
https://dl.acm.org/doi/10.1145/2166879.2166882
Gebhart MJohnson DTarjan DKeckler SDally WLindholm ESkadron K(2011)Energy-efficient mechanisms for managing thread context in throughput processorsACM SIGARCH Computer Architecture News10.1145/2024723.200009339:3(235-246)Online publication date: 4-Jun-2011
https://dl.acm.org/doi/10.1145/2024723.2000093
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents