[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/871506.871564acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
Article

Reducing reorder buffer complexity through selective operand caching

Published: 25 August 2003 Publication History

Abstract

Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures, such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches ("retention latches"). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.

References

[1]
Burger, D. and Austin, T. M., "The SimpleScalar tool set: Version 2.0", Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, June 1997 and documentation for all Simplescalar releases (through version 3.0).
[2]
Balasubramonian, R., Dwarkadas, S., Albonesi, D., "Reducing the Complexity of the Register File in Dynamic Superscalar Processor", in Proc. of the 34th Int'l. Symposium on Microarchitecture (MICRO-34), 2001.
[3]
Borch, E., Tune, E., Manne, S., Emer, J., "Loose Loops Sink Chips", in Proceedings of Int'l. Conference on High Performance Computer Architecture (HPCA-02), 2002.
[4]
Cruz, J-L. et. al., "Multiple-Banked Register File Architecture", in Proceedings 27th Int'l. Symposium on Computer Architecture, 2000, pp. 316--325.
[5]
Folegnani, D., Gonzalez, A., "Energy-Effective Issue Logic", in Proceedings of Int'l. Symp. on Computer Architecture, July 2001.
[6]
Franklin, M., Sohi, G., "Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors", in International Symposium on Microarchitecture, 1992.
[7]
Hu, Z. and Martonosi, M., "Reducing Register File Power Consumption by Exploiting Value Lifetime Characteristics", in Workshop on Complexity-Effective Design, 2000.
[8]
Intel Corporation, "The Intel Architecture Software Developers Manual", 1999.
[9]
Kessler, R.E., "The Alpha 21264 Microprocessor", IEEE Micro, 19(2) (March 1999), pp. 24--36.
[10]
Kucuk, G., Ponomarev, D., Ghose, K., "Low Complexity Reorder Buffer Architecture", in Proceedings of Int'l Conference on Supercomputing, June, 2002, pp.57--66.
[11]
Lozano, G. and Gao, G., "Exploiting Short-Lived Variables in Superscalar Processors", in Proceedings of Int'l Symposium on Microarchitecture, 1995, pp. 292--302.
[12]
Martinez, J., Renau, J., Huang, M., Prvulovich, M., Torrellas, J., "Cherry: Checkpointed Early Resource Recycling in Out-of-order Microprocessors", in Proceedings of the 35th International Symposium on Microarchitecture, 2002.
[13]
Ponomarev, D., Kucuk, G., Ghose, K., "Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources", in Proc. of the 34th Int'l. Symposium on Microarchitecture (MICRO-34), December 2001.
[14]
Savransky, E., Ronen, R., Gonzalez, A., "Lazy Retirement: A Power Aware Register Management Mechanism", in Workshop on Complexity-Effective Design, 2002.
[15]
Park, Il., Powell, M., Vijaykumar, T., "Reducing Register Ports for Higher Speed and Lower Energy", in Proc. of the 35th International Symposium on Microarchitecture, 2002.
[16]
Kim, N., Mudge, T., "Reducing Register Ports Using Delayed Write-Back Queues and Operand Pre-Fetch", in Proc. of Int'l Conference on Supercomputing, 2003.

Cited By

View all
  • (2004)Isolating Short-Lived Operands for Energy ReductionIEEE Transactions on Computers10.1109/TC.2004.1153:6(697-709)Online publication date: 1-Jun-2004
  • (2003)Reducing datapath energy through the isolation of short-lived operandsOceans 2002 Conference and Exhibition. Conference Proceedings (Cat. No.02CH37362)10.1109/PACT.2003.1238021(258-268)Online publication date: 2003
  • (2003)Distributed reorder buffer schemes for low powerProceedings 21st International Conference on Computer Design10.1109/ICCD.2003.1240920(364-370)Online publication date: 2003

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design
August 2003
502 pages
ISBN:158113682X
DOI:10.1145/871506
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. low-complexity datapath
  2. low-power design
  3. reorder buffer
  4. short-lived values

Qualifiers

  • Article

Conference

ISLPED03
Sponsor:

Acceptance Rates

ISLPED '03 Paper Acceptance Rate 90 of 221 submissions, 41%;
Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2004)Isolating Short-Lived Operands for Energy ReductionIEEE Transactions on Computers10.1109/TC.2004.1153:6(697-709)Online publication date: 1-Jun-2004
  • (2003)Reducing datapath energy through the isolation of short-lived operandsOceans 2002 Conference and Exhibition. Conference Proceedings (Cat. No.02CH37362)10.1109/PACT.2003.1238021(258-268)Online publication date: 2003
  • (2003)Distributed reorder buffer schemes for low powerProceedings 21st International Conference on Computer Design10.1109/ICCD.2003.1240920(364-370)Online publication date: 2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media