ROC-1: Hardware Support for Recovery-Oriented Computing
- David Oppenheimer,
- Aaron Brown,
- James Beck,
- Daniel Hettena,
- Jon Kuroda,
- Noah Treuhaft,
- David A. Patterson,
- Katherine Yelick
We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of Recovery-Oriented Computing (ROC) by emphasizing detection and ...
Anomaly Detection in Embedded Systems
By employing fault tolerance, embedded systems can withstand both intentional and unintentional faults. Many fault-tolerance mechanisms are invoked only after a fault has been detected by whatever fault-detection mechanism is used, hence, the process of ...
Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond
Message-driven confidence-driven (MDCD) error containment and recovery, a low-cost approach to mitigating the effect of software design faults in distributed embedded systems, is developed for onboard guarded software upgrading for deep-space missions. ...
Dependability of COTS Microkernel-Based Systems
The commercial offer concerning microkernel technology constitutes an attractive alternative for developing operating systems to suit a wide range of application domains. However, the integration of COTS microkernels into critical embedded computer ...
Rigorous Development of an Embedded Fault-Tolerant System Based on Coordinated Atomic Actions
This paper describes our experience using coordinated atomic (CA) actions as a system structuring tool to design and validate a sophisticated and embedded control system for a complex industrial application that has high reliability and safety ...
ED4I: Error Detection by Diverse Data and Duplicated Instructions
Errors in computing systems can cause abnormal behavior and degrade data integrity and system availability. Errors should be avoided especially in embedded systems for critical applications. However, as the trend in VLSI technologies has been toward ...
Test Generation and Testability Alternatives Exploration of Critical Algorithms for Embedded Applications
This paper presents an analysis of the behavioral descriptions of embedded systems to generate behavioral test patterns used to perform the exploration of design alternatives based on testability. In this way, during the hardware/software partitioning ...
Closed Partition Lattice and Machine Decomposition
Finite state machines are widely used to model systems in diverse areas. Often, the modeling machines can be decomposed into smaller component machines and this decomposition can facilitate the system design, implementation, and analysis. Hartmanis and ...
Design Method of a Class of Embedded Combinational Self-Testing Checkers for Two-Rail Codes
This paper tackles the open problem of designing combinational self-testing checkers (STCs) for K-pair 2-rail codes which are self-testing, even by a subset of codewords, such that some input lines are 0 (or 1) for only one input codeword. The checker ...
Identifying Efficient Combinations of Error Detection Mechanisms Based on Results of Fault Injection Experiments
We introduce novel performance ratings for error detection mechanisms. Given a proper setup of the fault injection experiments, these ratings can be directly computed from raw readout data. They allow the evaluation of the overall performance of ...
Guest Editors' Introduction
First Page of the Article