[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3196398.3196432acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Prevalence of confusing code in software projects: atoms of confusion in the wild

Published: 28 May 2018 Publication History

Abstract

Prior work has shown that extremely small code patterns, such as the conditional operator and implicit type conversion, can cause considerable misunderstanding in programmers. Until now, the real world impact of these patterns - known as 'atoms of confusion' - was only speculative. This work uses a corpus of 14 of the most popular and influential open source C and C++ projects to measure the prevalence and significance of these small confusing patterns. Our results show that the 15 known types of confusing micro patterns occur millions of times in programs like the Linux kernel and GCC, appearing on average once every 23 lines. We show there is a strong correlation between these confusing patterns and bug-fix commits as well as a tendency for confusing patterns to be commented. We also explore patterns at the project level showing the rate of security vulnerabilities is higher in projects with more atoms. Finally, we examine real code examples containing these atoms, including ones that were used to find and fix bugs in our corpus. In total this work demonstrates that beyond simple misunderstanding in the lab setting, atoms of confusion are both prevalent - occurring often in real projects, and meaningful - being removed by bug-fix commits at an elevated rate.

References

[1]
2016. Version Control Systems Popularity in 2016. (2016). https://rhodecode.com/insights/version-control-systems-2016
[2]
2017. Git Coding Guidelines. (Jun 2017). https://github.com/git/git/blob/c5da34c12481f6edc3f46463cbf43efe856308e/Documentation/CodingGuidelines
[3]
2017. Non-bugs. (Dec 2017). https://gcc.gnu.org/bugs/#nonbugs
[4]
Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, and Premkumar Devanbu. 2009. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 121--130.
[5]
Terry Bollinger. 2003. Use of free and open-source software (FOSS) in the US department of defense. (2003).
[6]
Harold Booth, Doug Rike, and Gregory A Witte. 2013. The National Vulnerability Database (NVD): Overview. ITL Bulletin- (2013).
[7]
Raymond PL Buse and Westley R Weimer. 2010. Learning a metric for code readability. IEEE Transactions on Software Engineering 36, 4 (2010), 546--558.
[8]
Alexander Chatzigeorgiou and Anastasios Manakos. 2010. Investigating the evolution of bad smells in object-oriented code. In Quality of Information and Communications Technology (QUATIC), 2010 Seventh International Conference on the. IEEE, 106--115.
[9]
Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476--493.
[10]
Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd edn. Hillsdale, New Jersey. (1988).
[11]
CDT Eclipse. 2007. Eclipse C/C++ Development Tooling-CDT. (2007). https://www.eclipse.org/cdt/
[12]
Beat Fluri, Michael Wuersch, Martin PInzger, and Harald Gall. 2007. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on Software Engineering 33, 11 (2007).
[13]
Dan Gopstein, Jake Iannacone, Yu Yan, Lois Anne Delong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos. 2017. Understanding Misunderstandings in Source Code. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM.
[14]
Maurice Howard Halstead. 1977. Elements of software science. Vol. 7. Elsevier New York.
[15]
Steve Hnizdur, Keith Matthews, Eddie Bleasdale, Alain Williams, Andrew Findlay, Sean Atkinson, and Charles Briscoe-Smith. 2003. The IDA Open Source Migration Guidelines. (2003).
[16]
M Horowitz and S Lunt. 1997. RFC 2228: FTP security extensions. Proposed Standard (1997).
[17]
Graylin Jay, Joanne E Hale, Randy K Smith, David P Hale, Nicholas A Kraft, and Charles Ward. 2009. Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship. JSEA 2, 3 (2009), 137--143.
[18]
Derek M. Jones. 2006. Developer beliefs about binary operator precedence. (2006).
[19]
Bernhard Katzmarski and Rainer Koschke. 2012. Program complexity metrics and programmer opinions. In Program Comprehension (ICPC), 2012 IEEE 20th International Conference on. IEEE, 17--26.
[20]
Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2012. An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empirical Software Engineering 17, 3 (2012), 243--275.
[21]
Thomas mann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 91--100.
[22]
Mika V Mantyla, Jari Vanhanen, and Casper Lassenius. 2004. Bad smells-humans as code critics. In Software Maintenance, 2004. Proceedings. 20th IEEE International Conference on. IEEE, 399--408.
[23]
Thomas J McCabe. 1976. A complexity measure. Software Engineering, IEEE Transactions on 4 (1976), 308--320.
[24]
Randy Meyers. 2001. The New C: X Macros. (May 2001). http://www.drdobbs.com/the-new-c-x-macros/184401387
[25]
Joseph M. Newcomer. {n. d.}. Mythology in C++: Exceptions are Expensive. ({n. d.}). http://www.flounder.com/exceptions.htm
[26]
Landon Curt Noll, Simon Cooper, Peter Seebach, and A Broukhis Leonid. 2016. The International Obfuscated C Code Contest. (2016).
[27]
Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrea De Lucia, and Denys Poshyvanyk. 2013. Detecting bad smells in source code using change history information. In Automated software engineering (ASE), 2013 IEEE/ACM 28th international conference on. IEEE, 268--278.
[28]
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the "Naturalness" of Buggy Code. In Proceedings of the 38th International Conference on Software Engineering. ACM, 428--439.
[29]
Jarrett Rosenberg. 1997. Some misconceptions about lines of code. In Software Metrics Symposium, 1997. Proceedings, Fourth International. IEEE, 137--142.
[30]
Jingqiu Shao and Yingxu Wang. 2003. A new measure of software complexity based on cognitive weights. Electrical and Computer Engineering, Canadian Journal of 28, 2 (2003), 69--74.
[31]
Dag IK Sjøberg, Aiko Yamashita, Bente CD Anda, Audris Mockus, and Tore Dybå. 2013. Quantifying the effect of code smells on maintenance effort. IEEE Transactions on Software Engineering 39, 8 (2013), 1144--1156.
[32]
Ian Skerrett. 2014. Eclipse Community Survey 2014 v2. (Jun 2014). https://www.slideshare.net/IanSkerrett/eclipse-community-survey-2014
[33]
Linus Torvalds. 2001. Linux kernel coding style. https://www.kernel.org/doc/Documentation/CodingStyle (2001).
[34]
Edwin B Wilson. 1927. Probable inference, the law of succession, and statistical inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.
[35]
Aiko Yamashita and Leon Moonen. 2013. Towards a taxonomy of programming-related difficulties during maintenance. In Software Maintenance (ICSM), 2013 29th IEEE International Conference on. IEEE, 424--427.

Cited By

View all
  • (2025)The downside of functional constructs: a quantitative and qualitative analysis of their fix-inducing effectsEmpirical Software Engineering10.1007/s10664-024-10568-z30:1Online publication date: 1-Feb-2025
  • (2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
  • (2024)A Dataset of Atoms of Confusion in the Android Open Source ProjectProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644874(520-524)Online publication date: 15-Apr-2024
  • Show More Cited By

Index Terms

  1. Prevalence of confusing code in software projects: atoms of confusion in the wild

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
    May 2018
    627 pages
    ISBN:9781450357166
    DOI:10.1145/3196398
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. program understanding
    2. programming languages

    Qualifiers

    • Research-article

    Conference

    ICSE '18
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)The downside of functional constructs: a quantitative and qualitative analysis of their fix-inducing effectsEmpirical Software Engineering10.1007/s10664-024-10568-z30:1Online publication date: 1-Feb-2025
    • (2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
    • (2024)A Dataset of Atoms of Confusion in the Android Open Source ProjectProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644874(520-524)Online publication date: 15-Apr-2024
    • (2024)“C”ing the light – assessing code comprehension in novice programmers using C code patternsComputer Science Education10.1080/08993408.2024.2317079(1-25)Online publication date: 15-Feb-2024
    • (2024)Just-in-Time crash prediction for mobile appsEmpirical Software Engineering10.1007/s10664-024-10455-729:3Online publication date: 8-May-2024
    • (2023)Evaluating the Code Comprehension of Novices with Eye TrackingProceedings of the XXII Brazilian Symposium on Software Quality10.1145/3629479.3629490(332-341)Online publication date: 6-Dec-2023
    • (2023)CONCORD: Clone-Aware Contrastive Learning for Source CodeProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598035(26-38)Online publication date: 12-Jul-2023
    • (2023)How They Relate and Leave: Understanding Atoms of Confusion in Open-Source Java Projects2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00022(119-130)Online publication date: 2-Oct-2023
    • (2023)An Investigation of confusing code patterns in JavaScriptJournal of Systems and Software10.1016/j.jss.2023.111731203:COnline publication date: 1-Sep-2023
    • (2023)A systematic literature review on the impact of formatting elements on code legibilityJournal of Systems and Software10.1016/j.jss.2023.111728203:COnline publication date: 1-Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media