[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/998680.1006732acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Published: 02 March 2004 Publication History

Abstract

Previous proposals for implementing instruction-level temporalredundancy in out-of-order cores have reported a performancedegradation of upto 45% in certain applications compared to anexecution which does not have any temporal redundancy. An importantcontributor to this problem is the insufficient number ofALUs for handling the amplified load injected into the core. At thesame time, increasing the number of ALUs can increase the complexityof the issue logic, which has been pointed out to be oneof the most timing critical components of the processor. This paperproposes a novel extension of a prior idea on instruction reuseto ease ALU bandwidth requirements in a complexity-effective wayby exploiting certain interesting properties of a dual (temporallyredundant) instruction stream. We present microarchitectural extensionsnecessary for implementing an instruction reuse buffer(IRB) and integrating this with the issue logic of a dual instructionstream superscalar core, and conduct extensive evaluationsto demonstrate how well it can alleviate the ALU bandwidth problem.We show that on the average we can gain back nearly 50%of the IPC loss that occurred due to ALU bandwidth limitationsfor an instruction-level temporally redundant superscalar execution,and 23% of the overall IPC loss.

References

[1]
{1} A. Aggarwal and M. Franklin. Instruction Replication: Reducing Delays Due to Inter-PE Communication Latency. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 46-55, September 2003.
[2]
{2} P. Ahuja, D. Clark, and A. Rogers. The Performance Impact of Incomplete Bypassing in Processor Pipelines. In Proceedings of the International Symposium on Microarchitecture (MICRO), November 1995.
[3]
{3} A. Aletà, J. Codina, A. González, and D. Kaeli. Instruction Replication for Clustered Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), December 2003.
[4]
{4} T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 196-207, November 1999.
[5]
{5} A. Baniasadi and A. Moshovos. Instruction Distribution Heuristics for Quad-Cluster Dynamically-Scheduled, Superscalar Processors. In Proceedings of the International Symposium on Computer Microarchitecture (MICRO), pages 337-347, December 2000.
[6]
{6} M. Brown, J. Stark, and Y. Patt. Select-Free Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 204-213, December 2001.
[7]
{7} D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.
[8]
{8} C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4):14-19, July-August 2003.
[9]
{9} Cacti 3.2. http://research.compaq.com/wrl/people/jouppi/CACTI.html.
[10]
{10} R. Canal, J. Parcerisa, and A. González. Dynamic Cluster Assignment Mechanisms. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pages 132- 142, January 2002.
[11]
{11} D. Citron and D. Feitelson. The Organization of Lookup Tables in Instruction Memoization. Technical Report 2000-4, Hebrew University of Jerusalem, March 2000.
[12]
{12} D. Citron and D. Feitelson. Revisiting Instruction Level Reuse. In Proceedings of the Workshop on Duplicating, Deconstructing, and Debunking (WDDD), May 2002.
[13]
{13} D. Citron and D. Feitelson. "Look It Up" or "Do The Math": An Energy, Area, and Timing Analysis of Instruction Reuse and Memoization. In Proceedings of the Workshop on Power-Aware Computer Systems, December 2003.
[14]
{14} T. Ehrhart and S. Patel. Reducing the Scheduling Critical Cycle using Wakeup Prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), February 2004.
[15]
{15} M. Gomaa, C. Scarbrough, T. Vijaykumar, and I. Pomeranz. Transient-Fault Recovery for Chip Multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 98-109, June 2003.
[16]
{16} HP NonStop Himalaya. http://nonstop.compaq.com/.
[17]
{17} M. Hrishikesh, N. Jouppi, K. Farkas, D. Burger, S. Keckler, and P. Shivakumar. The Optimal Logic Depth Per Pipeline State is 6 to 8 FO4 Inverter Delays. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 14-24, June 2002.
[18]
{18} S.-J. Lee and P.-C. Yew. On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors. In Proceedings of the International Conference on Parallel Architectures and Compliation Techniques (PACT), pages 145-156, October 2000.
[19]
{19} M. Lipasti and J. Shen. Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 226-237, December 1996.
[20]
{20} A. Mendelson and N. Suri. Designing high-performance and reliable superscalar architectures-the out of order reliable superscalar (O3RS) approach. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 473-481, June 2000.
[21]
{21} S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin - Madison, 1998.
[22]
{22} S. Palacharla, N. Jouppi, and J. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 206-218, June 1997.
[23]
{23} A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. Technical Report CSE- 04-008, The Pennsylvania State University, March 2004.
[24]
{24} J. Ray, J. Hoe, and B. Falsafi. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 214- 224, December 2001.
[25]
{25} S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25-36, June 2000.
[26]
{26} E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84-91, June 1999.
[27]
{27} J. Shen and M. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors (Beta Edition). McGraw Hill, 2003.
[28]
{28} P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002.
[29]
{29} A. Sodani and G. Sohi. Dynamic Instruction Reuse. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 194-205, June 1997.
[30]
{30} A. Sodani and G. Sohi. Understanding the Differences Between Value Prediction and Instruction Reuse. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 205- 215, December 1998.
[31]
{31} G. Sohi, M. Franklin, and K. Saluja. A Study of Time-Redundant Fault Tolerant Techniques in High Performance Pipelined Computers. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 436-443, June 1989.
[32]
{32} J. Stark, M. Brown, and Y. Patt. On Pipelining Dynamic Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 57-66, December 2000.
[33]
{33} T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87-98, May 2002.
[34]
{34} T.-Y. Yeh and Y. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 124-134, May 1992.
[35]
{35} J. Yi, R. Sendag, and D. Lilja. Increasing Instruction-Level Parallelism with Instruction Precomputation. In Proceedings of Euro-Par, August 2002.
[36]
{36} J. Zeigler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1):19-39, January 1996.

Cited By

View all
  • (2024)Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and ReliabilityACM Computing Surveys10.1145/366367256:11(1-76)Online publication date: 28-Jun-2024
  • (2010)Instruction precomputation with memoization for fault detectionProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871328(1665-1668)Online publication date: 8-Mar-2010
  • (2010)Minimal Multi-threadingProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.41(337-348)Online publication date: 4-Dec-2010
  • Show More Cited By
  1. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
    June 2004
    373 pages
    ISBN:0769521436
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
      ISCA 2004
      March 2004
      373 pages
      ISSN:0163-5964
      DOI:10.1145/1028176
      Issue’s Table of Contents

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 02 March 2004

    Check for updates

    Author Tags

    1. Complexity-effective design
    2. Instruction Reuse
    3. Temporal Redundancy

    Qualifiers

    • Article

    Conference

    ISCA04
    Sponsor:

    Acceptance Rates

    ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and ReliabilityACM Computing Surveys10.1145/366367256:11(1-76)Online publication date: 28-Jun-2024
    • (2010)Instruction precomputation with memoization for fault detectionProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871328(1665-1668)Online publication date: 8-Mar-2010
    • (2010)Minimal Multi-threadingProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.41(337-348)Online publication date: 4-Dec-2010
    • (2009)Compiler-assisted soft error detection under performance and energy constraints in embedded systemsACM Transactions on Embedded Computing Systems10.1145/1550987.15509908:4(1-30)Online publication date: 24-Jul-2009
    • (2007)A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processorsProceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware10.5555/1280094.1280104(55-64)Online publication date: 4-Aug-2007
    • (2007)Transient fault prediction based on anomalies in processor eventsProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266613(1140-1145)Online publication date: 16-Apr-2007
    • (2007)Dynamic prediction of architectural vulnerability from microarchitectural stateACM SIGARCH Computer Architecture News10.1145/1273440.125072635:2(516-527)Online publication date: 9-Jun-2007
    • (2007)Mechanisms for bounding vulnerabilities of processor structuresACM SIGARCH Computer Architecture News10.1145/1273440.125072535:2(506-515)Online publication date: 9-Jun-2007
    • (2007)Dynamic prediction of architectural vulnerability from microarchitectural stateProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250726(516-527)Online publication date: 9-Jun-2007
    • (2007)Mechanisms for bounding vulnerabilities of processor structuresProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250725(506-515)Online publication date: 9-Jun-2007
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media