[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Safety and Reliability Driven Task Allocation in Distributed Systems

Published: 01 March 1999 Publication History

Abstract

Distributed computer systems are increasingly being employed for critical applications, such as aircraft control, industrial process control, and banking systems. Maximizing performance has been the conventional objective in the allocation of tasks for such systems. Inherently, distributed systems are more complex than centralized systems. The added complexity could increase the potential for system failures. Some work has been done in the past in allocating tasks to distributed systems, considering reliability as the objective function to be maximized. Reliability is defined to be the probability that none of the system components fails while processing. This, however, does not give any guarantees as to the behavior of the system when a failure occurs. A failure, not detected immediately, could lead to a catastrophe. Such systems are unsafe. In this paper, we describe a method to determine an allocation that introduces safety into a heterogeneous distributed system and at the same time attempts to maximize its reliability. First, we devise a new heuristic, based on the concept of clustering, to allocate tasks for maximizing reliability. We show that for task graphs with precedence constraints, our heuristic performs better than previously proposed heuristics. Next, by applying the concept of task-based fault tolerance, which we have previously proposed, we add extra assertion tasks to the system to make it safe. We present a new heuristic that does this in such a way that the decrease in reliability for the added safety is minimized. For the purpose of allocating the extra tasks, this heuristic performs as well as previously known methods and runs an order of magnitude faster. We present a number of simulation results to prove the efficacy of our scheme.

References

[1]
S.J. Kim and J.C. Browne, “A General Approach to Mapping of Parallel Computations upon Multiprocessor Architectures,” Proc. Int'l Conf. Parallel Processing, vol. 3, pp. 1-8, University Park, Penn., Aug. 1988.
[2]
G.C. Sih and E.A. Lee, “Declustering: A New Multiprocessor Scheduling Technique,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 6, pp. 625-637, June 1993.
[3]
G.C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 175-186, Feb. 1993.
[4]
I. Page T. Jacob and E. Chern, “Fast Algorithms for Distributed Resource Allocation,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 188-197, Feb. 1993.
[5]
C.M. Woodside and G.G. Monforton, “Fast Allocation of Processes in Distributed and Parallel Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 164-174, Feb. 1993.
[6]
Y.C. Chow and W.H. Kohler, “Models for Dynamic Load Balancing in a Heterogeneous Multiple Processor System,” IEEE Trans. Computers, vol. 28, no. 5, pp. 354-361, May 1979.
[7]
G. Cybenko, “Dynamic Load Balancing for Distributed Memory Multiprocessors,” J. Parallel and Distributed Computing, vol. 7, pp. 279-301, 1989.
[8]
H.S. Stone, “Multiprocessor Scheduling with the Aid of Network Flow Algorithms,” IEEE Trans. Software Eng., vol. 3, no. 1, pp. 85-93, Jan. 1977.
[9]
P.Y.R. Ma E.Y.S. Lee and M. Tsuchiya, “A Task Allocation Model for Distributed Computing Systems,” IEEE Trans. Computers, vol. 31, no. 1, pp. 41-47, Jan. 1982.
[10]
W.W. Chu L.J. Holloway M.T. Lan and K. Efe, “Task Allocation in Distributed Data Processing,” Computer, pp. 57-69, Nov. 1980.
[11]
K. Efe, “Heuristic Models of Task Assignment Scheduling in Distributed Systems,” Computer, pp. 50-56, June 1982.
[12]
V.M. Lo, “Heuristic Algorithms for Task Assignments in Distributed Systems,” IEEE Trans. Computers, vol. 37, no. 11, pp. 1,384-1,397, Nov. 1988.
[13]
N.S. Bowen C.N. Nikolaou and A. Ghafoor, “On the Assignment Problem of Arbitrary Process Systems to Heterogeneous Distributed Computer Systems,” IEEE Trans. Computers, vol. 41, no. 3, Mar. 1992.
[14]
S. Yajnik S. Srinivasan and N.K. Jha, “TBFT: A Task-Based Fault Tolerance Scheme for Distributed Systems,” Proc. Int'l Conf. Parallel and Distributed Computer Systems, Las Vegas, Nev., Oct. 1994.
[15]
W.H. Kohler, “A Preliminary Evaluation of the Critical Path Method for Scheduling Tasks on Multiple Processor Systems,” IEEE Trans. Computers, vol. 24, no. 12, pp. 1,235-1,238, Dec. 1975.
[16]
S. Tridandapani and A.K. Somani, “Efficient Utilization of Spare Capacity for Fault Detection and Location in Multiprocessor Systems,” Proc. Int'l Symp. Fault-Tolerant Computing, pp. 440-447, Montreal, July 1992.
[17]
N.K. Jha and S. Kundu, Testing and Reliable Design of CMOS Circuits. Norwell, Mass.: Kluwer Academic, 1990.
[18]
B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems. Reading, Mass.: Addison-Wesley, 1989.
[19]
S.M. Shatz J.P. Wang and M. Goto, “Task Allocation for Maximizing Reliability of Distributed Computer Systems,” IEEE Trans. Computers, vol. 41, no. 9, pp. 1,156-1,168, Sept. 1992.
[20]
C.J. Hou and K.G. Shin, “Allocation of Periodic Task Modules with Precedence and Deadline Constraints in Distributed Real-Time Systems,” Proc. Real-Time Systems Symp., pp. 146-155, Dec. 1992.
[21]
N.H. Vaidya and D.K. Pradhan, “Fault-Tolerant Design Strategies for Reliability and Safety,” IEEE Trans. Computers, vol. 42, no. 10, pp. 1,195-1,206, Oct. 1993.
[22]
M. Mullazani, “Reliability and Safety,” Proc. Fourth IFAC Workshop Safety Comput. Contr. Systems, pp. 141-146, 1985.
[23]
B.W. Johnson and J.H. Aylor, “Reliability and Safety Analysis of a Fault-Tolerant Controller,” IEEE Trans. Reliability, vol. 35, no. 10, pp. 355-362, Oct. 1986.
[24]
K.H. Huang and J.A. Abraham, “Algorithm-Based Fault Tolerance for Matrix Operations,” IEEE Trans. Computers, vol. 33, no. 6, pp. 518-528, June 1984.
[25]
A.L.N. Reddy and P. Banerjee, “Algorithm-Based Fault Detection for Signal Processing Applications,” IEEE Trans. Computers, vol. 39, no. 10, pp. 1,304-1,308, Oct. 1990.
[26]
Y.H. Choi and M. Malek, “A Fault-Tolerant Systolic Sorter,” IEEE Trans. Computers, vol. 37, no. 5, pp. 621-624, May 1988.
[27]
S. Srinivasan and N.K. Jha, “Efficient Diagnosis in Algorithm-Based Fault Tolerant Multiprocessor Systems,” Proc. Int'l Conf. Computer Design, pp. 592-595, Boston, Oct. 1993.

Cited By

View all
  • (2018)Priority inversion in DRTDBSProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3152494.3167976(305-309)Online publication date: 11-Jan-2018
  • (2018)Reprint of A robust reliable energy-aware urgent computing resource allocation for flash-flood ensemble forecasting on HPC infrastructures for decision supportFuture Generation Computer Systems10.1016/j.future.2016.11.01579:P1(114-127)Online publication date: 1-Feb-2018
  • (2017)Modified multi-objective firefly algorithm for task scheduling problem on heterogeneous systemsInternational Journal of Bio-Inspired Computation10.1504/IJBIC.2016.0813258:6(379-393)Online publication date: 1-Jan-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 10, Issue 3
March 1999
143 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 March 1999

Author Tags

  1. Allocation
  2. clustering
  3. distributed systems
  4. reliability
  5. safety.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Priority inversion in DRTDBSProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3152494.3167976(305-309)Online publication date: 11-Jan-2018
  • (2018)Reprint of A robust reliable energy-aware urgent computing resource allocation for flash-flood ensemble forecasting on HPC infrastructures for decision supportFuture Generation Computer Systems10.1016/j.future.2016.11.01579:P1(114-127)Online publication date: 1-Feb-2018
  • (2017)Modified multi-objective firefly algorithm for task scheduling problem on heterogeneous systemsInternational Journal of Bio-Inspired Computation10.1504/IJBIC.2016.0813258:6(379-393)Online publication date: 1-Jan-2017
  • (2017)A Survey on Modeling and Optimizing Multi-Objective SystemsIEEE Communications Surveys & Tutorials10.1109/COMST.2017.269836619:3(1867-1901)Online publication date: 1-Jul-2017
  • (2017)Energy Efficiency Optimization for Communication of Air-Based Information Network with Guaranteed Timing ConstraintsJournal of Signal Processing Systems10.1007/s11265-016-1125-686:2-3(299-312)Online publication date: 1-Mar-2017
  • (2017)A Reliability-aware Task Scheduling Algorithm Based on Replication on Heterogeneous Computing SystemsJournal of Grid Computing10.1007/s10723-016-9386-715:1(23-39)Online publication date: 1-Mar-2017
  • (2016)A Survey of Task Allocation and Load Balancing in Distributed SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.240790027:2(585-599)Online publication date: 1-Feb-2016
  • (2016)Urgent Computing - A General Makespan Robustness Model for Ensembles of ForecastsProcedia Computer Science10.1016/j.procs.2016.05.52680:C(2062-2073)Online publication date: 1-Jun-2016
  • (2016)SAFE-CROWDSecurity and Communication Networks10.1002/sec.12619:15(2686-2695)Online publication date: 1-Oct-2016
  • (2015)A PSO-Optimized Real-Time Fault-Tolerant Task Allocation Algorithm in Wireless Sensor NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.238634326:12(3236-3249)Online publication date: 1-Dec-2015
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media