[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Exploiting Structural Duplication for Lifetime Reliability Enhancement

Published: 01 May 2005 Publication History

Abstract

Increased power densities (and resultant temperatures) and other effects of device scaling are predicted to cause significant lifetime reliability problems in the near future. In this paper, we study two techniques that leverage microarchitectural structural redundancy for lifetime reliability enhancement. First, in structural duplication (SD), redundant microarchitectural structures are added to the processor and designated as spares. Spare structures can be turned on when the original structure fails, increasing the processorýs lifetime. Second, graceful performance degradation (GPD) is a technique which exploits existing microarchitectural redundancy for reliability. Redundant structures that fail are shut down while still maintaining functionality, thereby increasing the processorýs lifetime, but at a lower performance. Our analysis shows that exploiting structural redundancy can provide significant reliability benefits, and we present guidelines for efficient usage of these techniques by identifying situations where each is more beneficial. We show that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability. Specifically, on average for our systems and applications,GPD increased processor reliability to 1.42 times the base value for less than a 5% loss in performance. On the other hand, for systems where reliability is more important than performance or cost, SD is more beneficial. SD increases reliability to 3.17 times the base value for 2.25 times the base cost, for our applications. Finally, a combination of the two techniques (SD+GPD) provides the highest reliability benefit.

References

[1]
{1} Assessing Product Reliability, Chapter 8, NIST/SEMATECH e-Handbook of Statistical Methods. In http://www.itl.nist.gov/div898/handbook/.
[2]
{2} Compaq NonStop Himalaya S-Series Server Description Manual. In Compaq Technical Manual 520331-001, http://www.compaq.com.
[3]
{3} Methods for Calculating Failure Rates in Units of FITs. In JEDEC Publication JESD85, 2001.
[4]
{4} F. Bower et al. Tolerating Hard Faults in Microprocessor Array Structures. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004.
[5]
{5} D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessor. In IEEE Micro, 2000.
[6]
{6} D. Brooks et al. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intl. Symp. on Comp. Arch., 2000.
[7]
{7} J. L. Hennessy and D. A. Patterson. Computer Architecture, A Quantitative Approach. Morgan Kaufmann, 2003.
[8]
{8} S. Heo et al. Reducing Power Density Through Activity Migration. In Intl. Symp. on Low Power Elec. Design, 2003.
[9]
{9} G. Hetheringon et al. Logic BIST for Large Industrial Designs: Real Issues and Case Studies. In Proceedings of the International Test Conference, 1999.
[10]
{10} V. Iyengar, L. H. Trevillyan, and P. Bose. Representative Traces for Processor Models with Infinite Cache. In Proc. of the 2nd Intl. Symp. on High-Perf. Comp. Architecture, 1996.
[11]
{11} I. Koren et al. Defect Tolerant VLSI Circuits: Techniques and Yield Analysis. In Proceedings of the IEEE, 1998.
[12]
{12} M. Moudgill et al. Environment for PowerPC microarchitectural exploration. In IEEE Micro, 1999.
[13]
{13} M. Moudgill et al. Validation of turandot, a fast processor model for microarchitectural exploration. In IEEE Intl Perf., Computing, and Communications Conf., 1999.
[14]
{14} W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, 1992.
[15]
{15} P. Shivakumar et al. Exploiting Microarchitectural Redundancy for Defect Tolerance. In 21st Intl. Conf. on Comp. Design, 2003.
[16]
{16} K. Skadron et al. Temperature-Aware Microarchitecture. In Proc. of the 30th Annual Intl. Symp. on Comp. Arch., 2003.
[17]
{17} L. Spainhower et al. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. In IBM Journal of R&D, September/November 1999.
[18]
{18} J. Srinivasan et al. The Case for Lifetime Reliability-Aware Microprocessors. In Proc. of the 31st Annual Intl. Symp. on Comp. Architecture, 2004.
[19]
{19} J. Srinivasan et al. The Impact of Technology Scaling on Lifetime Reliability. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004.
[20]
{20} J. M. Tendler et al. POWER4 System Microarchitecture. In IBM Journal of Research and Development, 2002.
[21]
{21} K. Trivedi. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Prentice Hall, 1982.
[22]
{22} S. Zafar et al. A Model for Negative Bias Temperature Instability (NBTI) in Oxide and High-KpFETs. In 2004 Symposia on VLSI Technology and Circuits, June, 2004.

Cited By

View all
  • (2023)Electromigration-Aware Memory Hierarchy ArchitectureJournal of Low Power Electronics and Applications10.3390/jlpea1303004413:3(44)Online publication date: 11-Jul-2023
  • (2023)FLEA - FIT-Aware Heuristic for Application Allocation in Many-Cores based on Q-Learning2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324296(1-6)Online publication date: 21-Nov-2023
  • (2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
ISCA 2005
May 2005
531 pages
ISSN:0163-5964
DOI:10.1145/1080695
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
    June 2005
    541 pages
    ISBN:076952270X

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2005
Published in SIGARCH Volume 33, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Electromigration-Aware Memory Hierarchy ArchitectureJournal of Low Power Electronics and Applications10.3390/jlpea1303004413:3(44)Online publication date: 11-Jul-2023
  • (2023)FLEA - FIT-Aware Heuristic for Application Allocation in Many-Cores based on Q-Learning2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324296(1-6)Online publication date: 21-Nov-2023
  • (2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
  • (2023)IntroductionBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_1(1-31)Online publication date: 2-Mar-2023
  • (2020)Reliability Optimization of Real-Time Satellite Embedded System Under Temperature VariationsIEEE Access10.1109/ACCESS.2020.30440448(224549-224564)Online publication date: 2020
  • (2020)Nested genetic algorithm for highly reliable and efficient embedded system designDesign Automation for Embedded Systems10.1007/s10617-020-09234-6Online publication date: 6-Mar-2020
  • (2020)Resource Management for Improving Overall Reliability of Multi-Processor Systems-on-ChipDependable Embedded Systems10.1007/978-3-030-52017-5_10(233-246)Online publication date: 10-Dec-2020
  • (2019)On the Efficiency of Voltage Overscaling under Temperature and Aging EffectsIEEE Transactions on Computers10.1109/TC.2019.291686968:11(1647-1662)Online publication date: 1-Nov-2019
  • (2019)Introduction to WearoutCircadian Rhythms for Future Resilient Electronic Systems10.1007/978-3-030-20051-0_1(3-14)Online publication date: 13-Jun-2019
  • (2018)Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applicationsIntegration, the VLSI Journal10.1016/j.vlsi.2017.10.01361:C(29-38)Online publication date: 1-Mar-2018
  • Show More Cited By

View Options

Login options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media