[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Software implemented hardware fault tolerance
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
ISBN:978-0-493-08801-3
Order Number:AAI3000076
Pages:
157
Reflects downloads up to 13 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

Transient errors in computer systems can cause abnormal behavior and degrade system reliability, data integrity and availability. This is especially true in a space environment where transient errors are a major cause of concern. Fault avoidance techniques such as radiation hardening and shielding have been major approaches to obtaining the required reliability. Recently, unhardened Commercial Off-The-Shelf (COTS) components have been investigated for space applications because of their higher density, faster clock rate, lower power consumption and lower price.

Since COTS components are not radiation hardened, and it is desirable to avoid shielding, Software-Implemented Hardware Fault Tolerance (SIHFT) has been proposed to increase the data integrity and availability of COTS systems. This dissertation presents three new SIHFT techniques for error detection: Control Flow Checking by Software Signatures (CFCSS), Error Detection by Duplicated Instructions (EDDI), and Error Detection by Diverse Data and Duplicated Instructions (ED 4 I).

Previously studied software techniques are either inadequate or require assistance from special hardware, but CFCSS, EDDI and ED 4 I are pure software methods. In CFCSS, signatures are embedded into the program during compilation and compared with run-time signatures during execution. In EDDI, instructions are duplicated at compile-time, and scheduled by exploiting Instruction-Level Parallelism (ILP) to reduce performance overhead. CFCSS and EDDI detect transient errors but not permanent faults. However, in ED 4 I, a program is compiled to a new program with diverse data so that it can detect a permanent fault.

Our fault injection experiment simulating bit flips in memory shows that EDDI provides over 98% fault coverage without any extra hardware. Because of instruction duplication, code size overhead is approximately 100%, but by exploiting ILP, we reduce the performance overhead down to 61% on average. For control flow checking experiment simulating branching faults, CFCSS provides 97% fault coverage. In addition, when we duplicate programs or instructions, we can use ED4I to enhance data integrity in the system.

Furthermore, for space experiments, we implemented EDDI and CFCSS in sort and FFT programs running in the ARGOS satellite. During a 136 day period, our techniques detected a total of 198 out of 203 errors, and show 98% error detection coverage.

Contributors
  • Stanford Engineering
  • Stanford University
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations