[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Debugging debugged, a metaphysical manifesto of systems integration

Published: 01 May 2008 Publication History

Abstract

Systems designers will most often design to the N-1 criterion whether the designers know they are doing so or not. Systems designed to the N-1 criterion detect, isolate and (possibly) recover from at most one fault at a time. In contrast to the N-1 criterion, systems integrators must fault isolate in the presence of multiple simultaneous faults and in the absence of user guides. The purpose of this paper is to debug the debugging process used by systems integrators. To that end this paper describes the systems integration environment, identifies factors that drive the efficiency of that effort and provides a critique of the historical roots of architectural firewalls. (If there were no firewalls everything could theoretically interfere with everything else as only the stricture of time would prevent everything from happening at once. Yet a perfect firewall would be an impossibility; a Maxwell's demon of information.) This paper penultimately provides philosophical musings, a self-reflection on meanings uncovered. As this paper has strong non-linear content an attempt has been made for textual constraint by theme:
I. The Systems Integration Environment
II. An Efficient Systems Integration Efficiency Metric
III. Architectural Investigations
IV. the Problem Behind the Problem (tPBtP)

References

[1]
Phil Agre, "Computer mess at Greyhound", Forum on Risks to the Public in Computers and Related Systems ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator, Vol. 16, No. 47, Oct. 20, 1994 URL: http://catless.ncl.ac.uk/Risks/16.47.html
[2]
American Heritage Dictionary, Second College Edition, Houghton Mifflin, Boston, 1982
[3]
Anonymous, "THE BRAKES FAIL", undated URL: http://www.funmunch.com/jokes/computer/brake.shtml (See also {50})
[4]
Gheorghe Antonoiu, Srimani, Pradip K., "Self-Stabilization: A New Paradigm for Fault Tolerance in Distributed Algorithm Design", Colorado State University, Computer Science Technical Report, Nov. 10, 1997 Technical Report CS-97-120 URL: citeseer.ist.psu.edu/antonoiu97selfstabilization.html
[5]
Anesh Arora, Gouda, Mohamend, "Distributed Reset", IEEE Transactions on Computers, Vol. 40, No. 9. Sept. 1994, pp. 1026--1038
[6]
James Bach, "Good Enough Quality: Beyond the Buzzword", IEEE Computer, Vol. 30, No. 8, Aug. 1997 pp. 96--28 URL: www.satisfice.com/articles/good_enough_quality.pdf
[7]
James Bach, "SE Education: We're on Our Own", IEEE Software, Vol. 14, No. 6, Nov. 1997 pp. 26--28
[8]
James Bach, "Exploratory Testing Explained", v.1.3 Apr. 16, 2003 URL: http://www.satisfice.com/articles/et-article.pdf (Saticefice, see {83})
[9]
Jonathan Bach, "Playing the Expert Game", IEEE Computer, Vol. 32, No. 8, Aug. 1999, pp. 99--101
[10]
Ken Birman, Chandersekaran, Coimbatore, Dolev, Danny, van Renesse, Robbert, "How the Hidden Hand Shapes the Market for Software Reliability", First Workshop on Applied Software Reliability (WASR 2006), IEEE, June 2006 URL: www.cs.huji.ac.il/~dolev/pubs/MarketFailure.pdf (A must read.)
[11]
Benjamin S. Blanchard, Fabrycky, Wolter J., Systems Engineering and Analysis, Prentice Hall, 1981
[12]
Paul G. Carlock, Fenton, Robert E., System of Systems (SoS) Enterprise Systems Engineering for Information-Intensive Organizations, Systems Engineering, Vol. 4, No. 4, Wiley, Oct. 2001
[13]
George Candea, Cutler, James, Fox, Armando, "Improving Availability with Recursive Micro-Reboots: A Soft-State System Case Study", Performance Evaluation Journal, Summer 2003 URL: http://citeseer.ist.psu.edu/cache/papers/cs/27027/
[14]
Ariel Daliot, Dolev, Danny, "Self-stabilizing Byzantine Agreement", Proceedings of the twenty-fifth annual ACM symposium on Principles of Distributed Computing, Denver, Colorado, 2006, pp. 143--152 URL www.cs.huji.ac.il/~dolev/pubs/fp250-daliot.pdf
[15]
Charles T. Davies, Jr., "Recovery semantics for a DB/DC system", ACM Annual Conference/Annual Meeting archive, Atlanta, Georgia, pp. 136--141, 1973 URL: http://portal.acm.org/citation.cfm?id=805694&coll=portal&dl=ACM
[16]
Charles T. Davies, Jr., "Data Processing Spheres of Control", IBM Systems Journal Vol. 17, No. 2, 1978, pp. 179--198.
[17]
Edsger W. Dijkstra, "Self-stabilizing Systems in Spite of Distributed Control", Communications of the ACM, Vol. 17, No. 11, Nov. 1974, pp 643--644 URL: http://portal.acm.org/citation.cfm?id=361202
[18]
Sumantra Ghoshal, "Bad Management Theories Are Destroying Good Management Practices", Academy of Management Learning & Education, Vol. 4, No.1, 2005, pp. 75--91 URL: http://journals.aomonline.org/amle/AMLEVolume4Issue1pp75--91.pdf
[19]
Paul Green, "The Art of Creating Reliable Software-based Systems using Off-the-Shelf Software Components", Proceedings of the 16th Symposium on Reliable Distributed Systems, 22--24 Oct. 1997, pp. 118--120.
[20]
S. Harris, "I think you should be more explicit here in step two", American Scientist, 1977 URL: http://www.sciencecartoonsplus.com/gallery.htm
[21]
Michi Henning, "The Rise and Fall of CORBA", ACM Queue, Vol. 4, No. 5, June 2006, pp. 28--34 URL: http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=396
[22]
Gerard J. Holzmann, Joshi, Rajeev, "Reliable Software Systems Design: Defect Prevention, Detection, and Containment", Position Paper presented at The Conference on Verified Software: Theories, Tools, Experiments (VSTTE), Zurich, Switzerland, Oct. 2005. URL: http://www.rjoshi.org/bio/files/zurich_rssd.pdf
[23]
Michael Hunter, "An Interview with James Whittaker", Dr. Dobb's Journal, Sept. 26, 2006 URL: http://www.ddj.com/development-tools/193005740
[24]
Raj Jain, The Art of Computer Systems Performance Analysis, Wiley, 1991
[25]
Mahesh Jayaram, Varghese, George, "Crash failures can drive protocols to arbitrary states", Proceedings of the fifteenth annual ACM Symposium on Principles of Distributed Computing (PODC 96), Philadelphia, Pennsylvania, 1996, pp. 247--256
[26]
Mahesh Jayaram, "The Complexity of Crash Failures", Proceedings of the Sixteenth Annual ACM Symposium on Principles of distributed computing (PODC 97), Santa Barbara, California, 1997, pp. 179--188
[27]
Tom Kindberg, "Debate: This house believes the development of robust distributed systems to be impossible", ACM Special Interest Group on Operating Systems, Vol. 33, No. 1, January 1999, pp. 15--17
[28]
Christopher Koch, "Software Quality: Bursting the CMM Hype", CIO, Mar. 01, 2004 URL: http://www.cio.com/article/32138/Software_Quality_Bursting_the_CMM_Hype (Something swept under the rug.)
[29]
Henry Korth, "The Double Life of the Transaction Abstraction: Fundamental Principle and Evolving System Concepts", Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland, 1995, pp. 2--6 URL: www.vldb.org/conf/1995/P002.PDF
[30]
Peter Ladkin, "Aviation: Boeing 787 network certification requirements", IEEE Spectrum Online, The Risk Factor, Jan. 19, 2008 URL: http://blogs.spectrum.ieee.org/riskfactor/2008/01/boeing_b787_network_certificat.html (See also the comments posted in reply)
[31]
Leslie Lamport, "Solved Problems, Unsolved Problems and Non-Problems in Concurrency", Annual ACM Symposium on Principles of Distributed Computing archive, Proceedings of the third annual ACM symposium on Principles of distributed computing, Vancouver, British Columbia, Canad, Aug. 27--29, 1984 URL: http://research.microsoft.com/users/lamport/pubs/solved-and-unsolved.pdf
[32]
David Lautzenheiser, "Chip Design Lacks System Predictability", Electronic Engineering Times, No. 1519, March 24, 2008, p 14 URL: http://www.eetimes.com/rss/showArticle.jhtml?articleID=206905073&cid=RSSfeed_eetimes_newsRSS (No surprise that a VP of marketing asks for the impossible.)
[33]
Nancy G. Leveson, Clark S. Turner, "An Investigation of the Therac-25 Accidents", UCI Technical Report #92-108, University of California, Irvine, Nov. 1992 URL: http://sunnyday.mit.edu/therac-25.html
[34]
William Leal, Arora, Anish, "Scalable Self-Stabilization via Composition", Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04), 2004, pp. 12--21
[35]
Stanaslaw Lem, Cyberiad, Harvest/HBJ Book, 2002
[36]
David E. Lowell, Chen, Peter M., Chandra, Subhachandra, "Exploring Failure Transparency and the Limits of Generic Recovery", Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, Vol. 4, 2000, pp 20--35 URL: http://portal.acm.org/citation.cfm?id=1251249
[37]
J. R. Lucas, "Minds, Machines and Gödel", Philosophy, XXXVI, 1961, pp. 112--127; reprinted in The Modeling of Mind, Kenneth M. Sayre and Frederick J. Crosson, eds., Notre Dame Press, 1963, pp.269--270 URL: http://users.ox.ac.uk/~jrlucas/mmg.html
[38]
J. R. Lucas, "Reason and Reality: An Essay in Metaphysics", 2006 URL: http://users.ox.ac.uk/~jrlucas/
[39]
James R. Martin, "What is the Red Bead Experiment?", undated URL: http://maaw.info/DemingsRedbeads.htm
[40]
Lindsay Marshall, "Top 10 Risks search queries", Forum on Risks to the Public in Computers and Related Systems, ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator, Vol. 20, No. 69, Dec. 16, 1999 URL: http://catless.ncl.ac.uk/Risks/20.69.html#subj12.1 (Ariane is an instance of reuse which ranks #9 on the top 10 list.)
[41]
Micheline Maynard, Wald, Matthew L., "New Delays Loom as F.A.A. Expands Airliner Review", The New York Times, April 10, 2008 URL: www.nytimes.com/2008/04/10/business/10safety.html
[42]
Edwin Morris, Levine, Linda, Meyers, Craig, Place, Pat, Plakosh, Dan "System of Systems Interoperability (SOSI): Final Report", TECHNICAL REPORT CMU/SEI-2004-TR-004, Apr. 2004 URL: http://www.sei.cmu.edu/pub/documents/04.reports/pdf/04tr004.pdf (See Chapter 7, Interview and Workshop Findings)
[43]
William Hugh Murray, "AT&T Failure", Forum on Risks to the Public in Computers and Related Systems, ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator, Vol. 9, No. 61, Jan. 20, 1990 URL: http://catless.ncl.ac.uk/Risks/9.61.html
[44]
Peter G. Neumann, "Cause of AT&T Network Failure", Forum on Risks to the Public in Computers and Related Systems, ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator, Vol. 9, No.62, Jan. 26, 1990 URL: http://catless.ncl.ac.uk/Risks/9.62.html
[45]
Peter G. Neumann, Computer Related Risks, ACM Press, NY, 1995 URL: http://catless.ncl.ac.uk/Risks
[46]
Peter G. Neumann, "Practical Architectures for Survivable Networks and Systems (Phase Two Final Report)", SRI International, June 30, 2000 URL: http://www.csl.sri.com/users/neumann/survivability.pdf (See {20})
[47]
Peter G. Neumann, "Inside Risks: Widespread Network Failures", Communications of the ACM, Vol. 50, No. 2, Feb. 2007 URL: http://www.csl.sri.com/users/neumann/insiderisks07.html#200
[48]
Charles Perrow, "Using Organizations: the Case of FEMA", Social Science Research Council, (SSRC), Jun 11, 2006 URL: http://understandingkatrina.ssrc.org/Perrow/
[49]
Kevin Phillips, "Numbers Racket: Why the economy is worse than we know", Harper's Magazine, May 2008 URL: http://harpers.org/archive/2008/05/0082023
[50]
Eric Postpischil, "Scratch Monkey Story", 11 February 1987 URL: http://edp.org/monkey.htm (Not safe for PETA-philes)
[51]
Steven S. Prevette, "Dr. Deming's Red Bead Experiment and RadCon", H N F-8232-FP, Rev. 0, Fluor Hanford Radiological Controls Richland. WA, To be presented at the CY 2001 ALARA Workshop, May 2001, URL: http://www5.hanford.gov/pdwdocs/fsd0001/osti/2001/I0004914.pdf
[52]
Alexander Romanovsky, "Coordinated Atomic Actions: How to Remain ACID in the Modern World", ACM SIGSOFT Software Engineering Notes, Vol. 26, No. 2, Mar. 2001
[53]
J. H. Saltzer, Reed, D. P., Clark, D. D., "End-to-end Arguments in System Design", ACM Transactions on Computer Systems (TOCS), Vol. 2, No. 4, November 1984, pp 277--288 URL: http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
[54]
Robert Schaefer, "Systems of Systems and Coordinated Atomic Actions", ACM SIGSOFT Software Engineering Notes, Vol. 30 No. 1, Jan. 2005
[55]
Robert Schaefer, "Deeper Questions: The Metaproblem of Large Organizations Developing Complex Systems and the Limits of Process", ACM Sigsoft Software Engineering Notes, Vol. 30, No. 4, Jul. 2005
[56]
Robert Schaefer, "A Rational Theory of System-Making Systems", ACM Sigsoft Software Engineering Notes, Vol. 31, No. 2, Mar. 2006
[57]
Robert Schaefer, "A Critical Programmer Searches for Professionalism", ACM Sigsoft Software Engineering Notes, Vol.31, No.4, Jul. 2006
[58]
Robert Schaefer, "A Systems Analysis of Systems Integration", ACM Sigsoft Software Engineering Notes, Vol. 33, No. 1, Jan. 2008
[59]
Marco Schneider, "Self Stabilization", ACM Computing Surveys, Vol. 25, No. 1, Mar. 1993 URL: http://www.cs.utexas.edu/~marco/survey.ps
[60]
Marc Selinger, "F/A-22's Software Stability 'No Longer an Issue' USAF says", Aviation Week, May, 3, 2004 URL: http://www.aviationweek.com/aw/generic/story_generic.jsp?channel=aerospacedaily&id=news/vic05034.xml
[61]
Sandeep Kumar Shukla, "Self Stability, Mutual Exclusion and other pardigms", Mar. 2, 1994 URL: http://citeseer.ist.psu.edu/shukla94self.html
[62]
James P. Stevenson, "F-22 Fighter Performance: How does the F-22A compare a quarter century later?", Sponsored by the Straus Military Reform Project of the Center for Defense Information, June 2006 URL: http://www.cdi.org/pdfs/Stevenson%20F-22%20Brief.pdf
[63]
Staff, "Business in Brief: FAA accused of having too cozy a link with airlines ", The Boston Globe, Mar. 8, 2008, p. F1 URL: http://www.boston.com/business/articles/2008/03/08/peta_files_animal_abuse_complaint_against_covidien/ (The odd name on the URL is due to a collation of reports with one heading.)
[64]
Martyn Thomas, "Boeing 787 networking Issues", Forum on Risks to the Public in Computers and Related Systems, ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator, Vol. 25, No. 1, Jan. 7, 2008 URL: http://catless.ncl.ac.uk/Risks/25.01.html
[65]
Yoshi Tsurumi, "Dysfunctional Management Education and Declining Global Competitiveness of the United States Economy", Journal of Management, Vol. 1, No. 1, 2007 URL: http://www.scientificjournals.org/journals2007/articles/1032.htm
[66]
U. G. A. Office, "Patriot missile defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia", Technical Report of the U.S. Government Accounting Office, GAO/IMTEC-92-26, 1992 URL: http://161.203.16.4/t2pbat6/145960.pdf
[67]
U. G. A. Office, "Tactical Aircraft: Changing Conditions Drive Need for New F/A-22 Business Case", Technical Report of the U.S. Government Accounting Office, GAO/IMTEC-04-391, 2004 URL: http://www.gao.gov/new.items/d04391.pdf
[68]
Jeffrey Voas, "Why is it so Hard to Predict Software System Trustworthiness from Software Component Worthiness?", IEEE Proceedings 20th Symposium on Reliable Distributed Systems, 2001, p. 179.
[69]
C. Wallace, Soparkar, N., "Spheres of Control: An Approach to Advanced Recovery", Advanced Transaction Models & Architectures workshop (ATMA), 1996 URL: citeseer.ist.psu.edu/453674.html
[70]
Graham Warwick, "F-22 avionics testing increases pace", Flight International, Mar. 2002, p. 25,
[71]
Robert Wall, "Tweaks to the F/A-22", Aviation Week, July 28, 2003, p28
[72]
Justin Wastnage, "Pictures: Navigational software glitch forces Lockheed Martin F-22 Raptors back to Hawaii, abandoning first foreign deployment to Japan", Flightglobal.com, Feb. 14, 2007 URL: http://www.flightglobal.com/articles/2007/02/14/212102/picturesnavigational-software-glitch-forces-lockheed-martin-f-22-raptors-back-to-hawaii.html
[73]
James A. Whittaker, "Software's Invisible Users", IEEE Software, Vol. 18, No. 3, May 2001, pp. 84--88
[74]
James A. Whittaker, Atkin, Steven, "Software Engineering Is Not Enough", IEEE Software, Volume 19, No. 4, Jul. 2002, pp. 108--115
[75]
Wikipedia Contributors, "Byzantine fault tolerance", Wikipedia, The Free Encyclopedia, Dec. 11, 2007 URL: http://en.wikipedia.org/wiki/Byzantine_fault_tolerance
[76]
Wikipedia Contributors, "Common Object Request Broker Architecture", Wikipedia, The Free Encyclopedia, Feb. 7, 2008. URL: http://en.wikipedia.org/wiki/CORBA
[77]
Wikipedia Contributors, "Geocentric Model", Wikipedia, The Free Encyclopedia, Mar. 10, 2008. URL: http://en.wikipedia.org/wiki/Geocentric_model
[78]
Wikipedia Contributors, "Grey goo", Wikipedia, The Free Encyclopedia, Mar. 25, 2008. URL: http://en.wikipedia.org/wiki/Grey_goo
[79]
Wikipedia Contributors, "How to Solve it", Wikipedia, The Free Encyclopedia, Dec. 20, 2007 URL: http://en.wikipedia.org/wiki/How_to_Solve_It
[80]
Wikipedia Contributors, "Ice-nine", Wikipedia, The Free Encyclopedia, Mar. 18, 2008. URL: http://en.wikipedia.org/wiki/Ice_nine
[81]
Wikipedia Contributors, "Jumping the shark", Wikipedia, The Free Encyclopedia, Jan. 29, 2008 URL: http://en.wikipedia.org/wiki/Jumping_the_shark
[82]
Wikipedia Contributors, "Maxwell's Demon", Wikipedia, The Free Encyclopedia, Feb. 28, 2008 URL: http://en.wikipedia.org/wiki/Maxwell's_Demon (See Stanislaw Lem {35})
[83]
Wikipedia Contributors, "Satisficing", Wikipedia, The Free Encyclopedia, Jan. 30, 2008 URL: http://en.wikipedia.org/wiki/Satisficing
[84]
Wikipedia Contributors, "Self-Stabilization", Wikipedia, The Free Encyclopedia, Jan. 22, 2008 URL: http://en.wikipedia.org/wiki/Self-stabilization
[85]
Wikipedia Contributors, "Turtles all the way down", Wikipedia, The Free Encyclopedia, Feb. 27, 2008 URL: http://en.wikipedia.org/wiki/Turtles_all_the_way_down
[86]
Wikipedia Contributors, "Web service", Wikipedia, The Free Encyclopedia, Feb. 14, 2008 URL: http://en.wikipedia.org/wiki/Web_service
[87]
Tim Wilson, "Antivirus Inventor: Security Departments are Wasting Their Time", Dark Reading, Feb 6, 2008 URL: http://www.darkreading.com/document.asp?doc_id=145224&WT.svl=news1_1
[88]
Nicholas Zvegintzov, "Do We Know Enough to Teach Software Engineering", IEEE Software, Vol. 20, No. 5, Sept. 2003, pp. 110--112
[89]
Nicholas Zvegintzov, "Then a miracle occurs", The 2007 Stevens Lecture on Software Development Methods, 11th European Conference on Software Maintenance and Reengineering, Amsterdam, Mar. 22, 2007, URL: http://www.maint.com/SoftwareMaintenance/07StevensSlides.doc ("Polya" is the "shibboleth" of professional problem solvers everywhere {79}.)

Cited By

View all

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 33, Issue 3
May 2008
85 pages
ISSN:0163-5948
DOI:10.1145/1360602
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2008
Published in SIGSOFT Volume 33, Issue 3

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media