More Web Proxy on the site http://driver.im/

Article

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Authors:

Angshuman Parashar,

Sudhanva Gurumurthi,

Anand SivasubramaniamAuthors Info & Claims

ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

Page 376

Published: 02 March 2004 Publication History

Abstract

Previous proposals for implementing instruction-level temporalredundancy in out-of-order cores have reported a performancedegradation of upto 45% in certain applications compared to anexecution which does not have any temporal redundancy. An importantcontributor to this problem is the insufficient number ofALUs for handling the amplified load injected into the core. At thesame time, increasing the number of ALUs can increase the complexityof the issue logic, which has been pointed out to be oneof the most timing critical components of the processor. This paperproposes a novel extension of a prior idea on instruction reuseto ease ALU bandwidth requirements in a complexity-effective wayby exploiting certain interesting properties of a dual (temporallyredundant) instruction stream. We present microarchitectural extensionsnecessary for implementing an instruction reuse buffer(IRB) and integrating this with the issue logic of a dual instructionstream superscalar core, and conduct extensive evaluationsto demonstrate how well it can alleviate the ALU bandwidth problem.We show that on the average we can gain back nearly 50%of the IPC loss that occurred due to ALU bandwidth limitationsfor an instruction-level temporally redundant superscalar execution,and 23% of the overall IPC loss.

References

[1]

{1} A. Aggarwal and M. Franklin. Instruction Replication: Reducing Delays Due to Inter-PE Communication Latency. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 46-55, September 2003.

Digital Library

[2]

{2} P. Ahuja, D. Clark, and A. Rogers. The Performance Impact of Incomplete Bypassing in Processor Pipelines. In Proceedings of the International Symposium on Microarchitecture (MICRO), November 1995.

Digital Library

[3]

{3} A. Aletà, J. Codina, A. González, and D. Kaeli. Instruction Replication for Clustered Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), December 2003.

Digital Library

[4]

{4} T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 196-207, November 1999.

Digital Library

[5]

{5} A. Baniasadi and A. Moshovos. Instruction Distribution Heuristics for Quad-Cluster Dynamically-Scheduled, Superscalar Processors. In Proceedings of the International Symposium on Computer Microarchitecture (MICRO), pages 337-347, December 2000.

Digital Library

[6]

{6} M. Brown, J. Stark, and Y. Patt. Select-Free Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 204-213, December 2001.

Digital Library

[7]

{7} D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.

[8]

{8} C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4):14-19, July-August 2003.

Digital Library

[9]

{9} Cacti 3.2. http://research.compaq.com/wrl/people/jouppi/CACTI.html.

[10]

{10} R. Canal, J. Parcerisa, and A. González. Dynamic Cluster Assignment Mechanisms. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pages 132- 142, January 2002.

[11]

{11} D. Citron and D. Feitelson. The Organization of Lookup Tables in Instruction Memoization. Technical Report 2000-4, Hebrew University of Jerusalem, March 2000.

[12]

{12} D. Citron and D. Feitelson. Revisiting Instruction Level Reuse. In Proceedings of the Workshop on Duplicating, Deconstructing, and Debunking (WDDD), May 2002.

[13]

{13} D. Citron and D. Feitelson. "Look It Up" or "Do The Math": An Energy, Area, and Timing Analysis of Instruction Reuse and Memoization. In Proceedings of the Workshop on Power-Aware Computer Systems, December 2003.

Digital Library

[14]

{14} T. Ehrhart and S. Patel. Reducing the Scheduling Critical Cycle using Wakeup Prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), February 2004.

Digital Library

[15]

{15} M. Gomaa, C. Scarbrough, T. Vijaykumar, and I. Pomeranz. Transient-Fault Recovery for Chip Multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 98-109, June 2003.

Digital Library

[16]

{16} HP NonStop Himalaya. http://nonstop.compaq.com/.

[17]

{17} M. Hrishikesh, N. Jouppi, K. Farkas, D. Burger, S. Keckler, and P. Shivakumar. The Optimal Logic Depth Per Pipeline State is 6 to 8 FO4 Inverter Delays. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 14-24, June 2002.

Digital Library

[18]

{18} S.-J. Lee and P.-C. Yew. On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors. In Proceedings of the International Conference on Parallel Architectures and Compliation Techniques (PACT), pages 145-156, October 2000.

Digital Library

[19]

{19} M. Lipasti and J. Shen. Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 226-237, December 1996.

Digital Library

[20]

{20} A. Mendelson and N. Suri. Designing high-performance and reliable superscalar architectures-the out of order reliable superscalar (O3RS) approach. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 473-481, June 2000.

Digital Library

[21]

{21} S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin - Madison, 1998.

Digital Library

[22]

{22} S. Palacharla, N. Jouppi, and J. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 206-218, June 1997.

Digital Library

[23]

{23} A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. Technical Report CSE- 04-008, The Pennsylvania State University, March 2004.

[24]

{24} J. Ray, J. Hoe, and B. Falsafi. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 214- 224, December 2001.

Digital Library

[25]

{25} S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25-36, June 2000.

Digital Library

[26]

{26} E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84-91, June 1999.

Digital Library

[27]

{27} J. Shen and M. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors (Beta Edition). McGraw Hill, 2003.

[28]

{28} P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002.

Digital Library

[29]

{29} A. Sodani and G. Sohi. Dynamic Instruction Reuse. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 194-205, June 1997.

Digital Library

[30]

{30} A. Sodani and G. Sohi. Understanding the Differences Between Value Prediction and Instruction Reuse. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 205- 215, December 1998.

Digital Library

[31]

{31} G. Sohi, M. Franklin, and K. Saluja. A Study of Time-Redundant Fault Tolerant Techniques in High Performance Pipelined Computers. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 436-443, June 1989.

[32]

{32} J. Stark, M. Brown, and Y. Patt. On Pipelining Dynamic Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 57-66, December 2000.

Digital Library

[33]

{33} T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87-98, May 2002.

Digital Library

[34]

{34} T.-Y. Yeh and Y. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 124-134, May 1992.

Digital Library

[35]

{35} J. Yi, R. Sendag, and D. Lilja. Increasing Instruction-Level Parallelism with Instruction Precomputation. In Proceedings of Euro-Par, August 2002.

Digital Library

[36]

{36} J. Zeigler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1):19-39, January 1996.

Digital Library

Cited By

Venkatesha SParthasarathi R(2024)Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and ReliabilityACM Computing Surveys10.1145/366367256:11(1-76)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3663672
Borodin DJuurlink BDe Micheli GAl-Hashimi BMueller WMacii E(2010)Instruction precomputation with memoization for fault detectionProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871328(1665-1668)Online publication date: 8-Mar-2010
https://dl.acm.org/doi/10.5555/1870926.1871328
Long GFranklin DBiswas SOrtiz POberg JFan DChong F(2010)Minimal Multi-threadingProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.41(337-348)Online publication date: 4-Dec-2010
https://dl.acm.org/doi/10.1109/MICRO.2010.41
Show More Cited By

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy
ISCA 2004

Previous proposals for implementing instruction-level temporalredundancy in out-of-order cores have reported a performancedegradation of upto 45% in certain applications compared to anexecution which does not have any temporal redundancy. An ...
Complexity-Effective Reorder Buffer Designs for Superscalar Processors

Abstract--All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies. One of the ways to implement register renaming is to use the slots within the Reorder Buffer (ROB) as physical registers. In such ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

June 2004

373 pages

ISBN:0769521436

ACM SIGARCH Computer Architecture News Volume 32, Issue 2
ISCA 2004
March 2004
373 pages
ISSN:0163-5964
DOI:10.1145/1028176
Issue’s Table of Contents

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 March 2004

Check for updates

Author Tags

Qualifiers

Article

Conference

ISCA04

Sponsor:

SIGARCH

ISCA04: The 31st Annual International Symposium on Computer Architecture 2004

June 19 - 23, 2004

München, Germany

Acceptance Rates

ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
492
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Venkatesha SParthasarathi R(2024)Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and ReliabilityACM Computing Surveys10.1145/366367256:11(1-76)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3663672
Borodin DJuurlink BDe Micheli GAl-Hashimi BMueller WMacii E(2010)Instruction precomputation with memoization for fault detectionProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871328(1665-1668)Online publication date: 8-Mar-2010
https://dl.acm.org/doi/10.5555/1870926.1871328
Long GFranklin DBiswas SOrtiz POberg JFan DChong F(2010)Minimal Multi-threadingProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.41(337-348)Online publication date: 4-Dec-2010
https://dl.acm.org/doi/10.1109/MICRO.2010.41
Hu JLi FDegalahal VKandemir MVijaykrishnan NIrwin M(2009)Compiler-assisted soft error detection under performance and energy constraints in embedded systemsACM Transactions on Embedded Computing Systems10.1145/1550987.15509908:4(1-30)Online publication date: 24-Jul-2009
https://dl.acm.org/doi/10.1145/1550987.1550990
Sheaffer JLuebke DSkadron K(2007)A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processorsProceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware10.5555/1280094.1280104(55-64)Online publication date: 4-Aug-2007
https://dl.acm.org/doi/10.5555/1280094.1280104
Narayanasamy SCoskun ACalder BLauwereins RMadsen J(2007)Transient fault prediction based on anomalies in processor eventsProceedings of the conference on Design, automation and test in Europe10.5555/1266366.1266613(1140-1145)Online publication date: 16-Apr-2007
https://dl.acm.org/doi/10.5555/1266366.1266613
Walcott KHumphreys GGurumurthi S(2007)Dynamic prediction of architectural vulnerability from microarchitectural stateACM SIGARCH Computer Architecture News10.1145/1273440.125072635:2(516-527)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250726
Soundararajan NParashar ASivasubramaniam A(2007)Mechanisms for bounding vulnerabilities of processor structuresACM SIGARCH Computer Architecture News10.1145/1273440.125072535:2(506-515)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250725
Walcott KHumphreys GGurumurthi STullsen DCalder B(2007)Dynamic prediction of architectural vulnerability from microarchitectural stateProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250726(516-527)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250726
Soundararajan NParashar ASivasubramaniam ATullsen DCalder B(2007)Mechanisms for bounding vulnerabilities of processor structuresProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250725(506-515)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250725
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents