[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/PRDC.2013.14guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Towards Formal Approaches to System Resilience

Published: 02 December 2013 Publication History

Abstract

Technology scaling and techniques such as dynamic voltage/frequency scaling are predicted to increase the number of transient faults in future processors. Error detectors implemented in hardware are often energy inefficient, as they are "always on." While software-level error detection can augment hardware-level detectors, creating detectors in software that are highly effective remains a challenge. In this paper, we first present anew LLVM-level fault injector called KULFI that helps simulate faults occurring within CPU state elements in a versatile manner. Second, using KULFI, we study the behavior of a family of well-known and simple algorithms under error injection. (We choose a family of sorting algorithms for this study.) We then propose a promising way to interpret our empirical results using a formal model that builds on the idea of predicate state transition diagrams. After introducing the basic abstraction underlying our predicate transition diagrams, we draw connections to the level of resilience empirically observed during fault injection studies. Building on the observed connections, we develop a simple, and yet effective, predicate-abstraction-based fault detector. While in its initial stages, ours is believed to be the first study that offers a formal way to interpret and compare fault injection results obtained from algorithms from within one family. Given the absolutely unpredictable nature of what a fault can do to a computation in general, our approach may help designers choose amongst a class of algorithms one that behaves most resilient of all.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
PRDC '13: Proceedings of the 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing
December 2013
349 pages
ISBN:9780769551302

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 2013

Author Tags

  1. Fault Tolerance
  2. Formal Approach
  3. System Resilience

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)CAFIMicroprocessors & Microsystems10.1016/j.micpro.2022.10464894:COnline publication date: 1-Oct-2022
  • (2018)Evaluating and accelerating high-fidelity error injection for HPCProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291716(1-13)Online publication date: 11-Nov-2018
  • (2018)FlipTrackerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291667(1-14)Online publication date: 11-Nov-2018
  • (2018)Modeling Application Resilience in Large-scale Parallel ExecutionProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225119(1-10)Online publication date: 13-Aug-2018
  • (2018)Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-Ievel Fault InjectionProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225089(1-10)Online publication date: 13-Aug-2018
  • (2018)Comparative analysis of soft-error detection strategiesProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203240(173-182)Online publication date: 8-May-2018
  • (2018)Evaluating and accelerating high-fidelity error injection for HPCProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00048(1-13)Online publication date: 11-Nov-2018
  • (2018)FlipTrackerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00011(1-14)Online publication date: 11-Nov-2018
  • (2018)Understanding scale-dependent soft-error behavior of scientific applicationsProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00075(482-491)Online publication date: 1-May-2018
  • (2018)Verifying Relative Safety, Accuracy, and Termination for Program ApproximationsJournal of Automated Reasoning10.1007/s10817-017-9421-960:1(23-42)Online publication date: 1-Jan-2018
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media