[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2043556.2043572acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

An empirical study on configuration errors in commercial and open source systems

Published: 23 October 2011 Publication History

Abstract

Configuration errors (i.e., misconfigurations) are among the dominant causes of system failures. Their importance has inspired many research efforts on detecting, diagnosing, and fixing misconfigurations; such research would benefit greatly from a real-world characteristic study on misconfigurations. Unfortunately, few such studies have been conducted in the past, primarily because historical misconfigurations usually have not been recorded rigorously in databases.
In this work, we undertake one of the first attempts to conduct a real-world misconfiguration characteristic study. We study a total of 546 real world misconfigurations, including 309 misconfigurations from a commercial storage system deployed at thousands of customers, and 237 from four widely used open source systems (CentOS, MySQL, Apache HTTP Server, and OpenLDAP). Some of our major findings include: (1) A majority of misconfigurations (70.0%~85.5%) are due to mistakes in setting configuration parameters; however, a significant number of misconfigurations are due to compatibility issues or component configurations (i.e., not parameter-related). (2) 38.1%~53.7% of parameter mistakes are caused by illegal parameters that clearly violate some format or rules, motivating the use of an automatic configuration checker to detect these misconfigurations. (3) A significant percentage (12.2%~29.7%) of parameter-based mistakes are due to inconsistencies between different parameter values. (4) 21.7%~57.3% of the misconfigurations involve configurations external to the examined system, some even on entirely different hosts. (5) A significant portion of misconfigurations can cause hard-to-diagnose failures, such as crashes, hangs, or severe performance degradation, indicating that systems should be better-equipped to handle misconfigurations.

References

[1]
P. Anderson, P. Goldsack, and J. Paterson. SmartFrog meets LCFG Autonomous Reconfiguration with Central Policy Control. In LISA, August 2003.
[2]
M. Attariyan and J. Flinn. Using causality to diagnose configuration bugs. In USENIX, June 2008.
[3]
M. Attariyan and J. Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In OSDI, October 2010.
[4]
A. B. Brown and D. A. Patterson. Undo for Operators: Building an Undoable E-mail Store. In USENIX, June 2003.
[5]
A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP'01.
[6]
CircleID. Misconfiguration brings down entire se domain in sweden. www.circleid.com/posts/misconfiguration_brings_down_entire_se_domain_in_sweden/.
[7]
O. Crameri, N. Knezević, D. Kostić, R. Bianchini, and W. Zwaenepoel. Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System. In SOSP'07, October 2007.
[8]
Debian. The Debian GNU/Linux FAQ, Chapter 8: The Debian Package Management Tools. http://www.debian.org/doc/FAQ/ch-pkgtools.en.html.
[9]
N. Feamster and H. Balakrishnan. Detecting BGP configuration faults with static analysis. In NSDI, May 2005.
[10]
D. Freedman, R. Pisani, and R. Purves. Statistics, 3rd Edition. W. W. Norton & Company., 1997.
[11]
J. Gray. Why do computers stop and what can be done about it? In Symp. on Reliability in Distributed Software and Database Systems, 1986.
[12]
J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel. Improved Error Reporting for Software that Uses Black-Box Components. In PLDI, 2007.
[13]
Hewlett-Packard. HP Storage Essentials SRM Software Suite. http://h18000.www1.hp.com/products/quickspecs/12191_na/12191_na.pdf.
[14]
IBM Corp. IBM Tivoli Software. http://www-01.ibm.com/software/tivoli/.
[15]
R. Johnson. More details on today's outage. http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919.
[16]
A. Kappor. Web-to-host: Reducing total cost of ownership. In Technical Report 200503, The Tolly Group, May 2000.
[17]
L. Keller, P. Upadhyaya, and G. Candea. ConfErr: A Tool for Assessing Resilience to Human Configuration Errors. In DSN, June 2008.
[18]
N. Kushman and D. Katabi. Enabling Configuration-Independent Automation by Non-Expert Users. In OSDI, October 2010.
[19]
S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes -- a comprehensive study on real world concurrency bug characteristics. In ASPLOS, March 2008.
[20]
R. A. Maxion and R. W. Reeder. Improving user-interface dependability through mitigation of human error. International Journal of Human-Computer Studies, 63, July 2005.
[21]
Microsoft Corp. Microsoft Baseline Security Analyzer. 2008. http://www.microsoft.com/technet/security/tools/MBSAHome.mspx.
[22]
B. Murphy and T. Gent. Measuring system and software reliability using an automated data collection process. In Quality and Reliability Engineering International, 11(5), 1995.
[23]
K. Nagaraja, F. Oliveira, R. Bianchini, R. P. Martin, and T. D. Nguyen. Understanding and Dealing with Operator Mistakes in Internet Services. In OSDI'04, October 2004.
[24]
NetApp, Inc. Proactive Health Management with AutoSupport. http://media.netapp.com/documents/wp-7027.pdf.
[25]
NetApp, Inc. Protection Manager. http://www.netapp.com/us/products/management-software/protection.html.
[26]
NetApp, Inc. Provisioning Manager. http://www.netapp.com/us/products/management-software/provisioning.html.
[27]
F. Oliveira, K. Nagaraja, R. Bachwani, R. Bianchini, R. P. Martin, and T. D. Nguyen. Understanding and Validating Database System Administration. In USENIX'06, 2006.
[28]
F. Oliveira, A. Tjang, R. Bianchini, R. P. Martin, and T. D. Nguyen. Barricade: Defending Systems Against Operator Mistakes. In EuroSys'10, April 2010.
[29]
D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS), March 2003.
[30]
D. Patterson, A. Brown, P. Broadwell, G. Candea, M. Chen, J. Cutler, P. Enriquez, A. Fox, E. Kiciman, M. Merzbacher, D. Oppenheimer, N. Sastry, W. Tetzlaff, J. Traupman, and N. Treuhaft. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. In Technical Report UCB//CSD-02-1175, University of California, Berkeley, March 2002.
[31]
A. Rabkin and R. Katz. Static Extraction of Program Configuration Options. In ICSE, May 2011.
[32]
V. Ramachandran, M. Gupta, M. Sethi, and S. R. Chowdhury. Determining Configuration Parameter Dependencies via Analysis of Configuration Data from Multi-tiered Enterprise Applications. In ICAC, June 2009.
[33]
E. Reisner, C. Song, K.-K. Ma, J. S. Foster, and A. Porter. Using symbolic evaluation to understand behavior in configurable software systems. In ICSE, May 2010.
[34]
RPM. Rpm package manager (rpm). http://rpm.org/.
[35]
Y.-Y. Su, M. Attariyan, and J. Flinn. AutoBash: improving configuration management with operating system causality analysis. In SOSP, October 2007.
[36]
M. Sullivan and R. Chillarege. Software defects and their impact on system availability: A study of field failures in operating systems. In FTCS, 1991.
[37]
M. Sullivan and R. Chillarege. A comparison of software defects in database management systems and operating systems. In International Symposium on Fault-Tolerant Computing, 1992.
[38]
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic Misconfiguration Troubleshooting with PeerPressure. In OSDI'04, October 2004.
[39]
R. Wang, X. Wang, K. Zhang, and Z. li. Towards Automatic Reverse Engineering of Software Security Configurations. In CCS, October 2008.
[40]
Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. In LISA'03, October 2003.
[41]
A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In OSDI, October 2004.
[42]
C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated Known Problem Diagnosis with Event Traces. In EuroSys, April 2006.
[43]
D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage. Improving Software Diagnosability via Log Enhancement. In ASPLOS, March 2011.
[44]
W. Zheng, R. Bianchini, and T. D. Nguyen. Automatic Configuration of Internet Services. In EuroSys, March 2007.

Cited By

View all
  • (2024)Towards Automated Configuration DocumentationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695311(2256-2261)Online publication date: 27-Oct-2024
  • (2024)Diffy: Data-Driven Bug Finding for ConfigurationsProceedings of the ACM on Programming Languages10.1145/36563858:PLDI(199-222)Online publication date: 20-Jun-2024
  • (2024)zkStream: a Framework for Trustworthy Stream ProcessingProceedings of the 25th International Middleware Conference10.1145/3652892.3700763(252-265)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. characteristic study
  2. misconfigurations

Qualifiers

  • Research-article

Funding Sources

Conference

SOSP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)18
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Automated Configuration DocumentationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695311(2256-2261)Online publication date: 27-Oct-2024
  • (2024)Diffy: Data-Driven Bug Finding for ConfigurationsProceedings of the ACM on Programming Languages10.1145/36563858:PLDI(199-222)Online publication date: 20-Jun-2024
  • (2024)zkStream: a Framework for Trustworthy Stream ProcessingProceedings of the 25th International Middleware Conference10.1145/3652892.3700763(252-265)Online publication date: 2-Dec-2024
  • (2024)Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via LogsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652106(13-25)Online publication date: 11-Sep-2024
  • (2024)TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime StateProceedings of the ACM on Software Engineering10.1145/36437481:FSE(473-493)Online publication date: 12-Jul-2024
  • (2024)ROBUST: 221 bugs in the Robot Operating SystemEmpirical Software Engineering10.1007/s10664-024-10440-029:3Online publication date: 23-Mar-2024
  • (2024)Towards a Taxonomy of Infrastructure as Code Misconfigurations: An Ansible StudyService-Oriented Computing10.1007/978-3-031-72578-4_5(83-103)Online publication date: 19-Oct-2024
  • (2023)Simplifying Cloud Management with Cloudless ComputingProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3628206(95-101)Online publication date: 28-Nov-2023
  • (2023)DiagConfig: Configuration Diagnosis of Performance Violations in Configurable Software SystemsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616300(566-578)Online publication date: 30-Nov-2023
  • (2023)Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud SystemsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587448(433-451)Online publication date: 8-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media