[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3643916.3644413acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer

Published: 13 June 2024 Publication History

Abstract

Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.

References

[1]
Khadijah Al Safwan, Mohammed Elarnaoty, and Francisco Servant. 2022. Developers' Need for the Rationale of Code Commits: An in-Breadth and in-Depth Study. Journal of Systems and Software 189 (July 2022), 111320.
[2]
Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How Do Developers Discuss Rationale?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Campobasso, 357--369.
[3]
Eman AlOmar, Mohamed Wiem Mkaouer, and Ali Ouni. 2019. Can Refactoring Be Self-Affirmed? An Exploratory Study on How Developers Document Their Refactoring Activities in Commit Messages. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). IEEE, Montreal, QC, Canada, 51--58.
[4]
Nicolas Bettenburg, Ahmed E Hassan, Bram Adams, and Daniel M German. 2015. Management of community contributions: A case study on the Android and Linux software ecosystems. Empirical Software Engineering 20 (2015), 252--289.
[5]
Manoj Bhat, Klym Shumaiev, Andreas Biesdorf, Uwe Hohenstein, and Florian Matthes. 2017. Automatic Extraction of Design Decisions from Issue Management Systems: A Machine Learning Based Approach. In Software Architecture, Antónia Lopes and Rogério de Lemos (Eds.). Vol. 10475. Springer International Publishing, Cham, 138--154.
[6]
Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. " O'Reilly Media, Inc.".
[7]
Janet E Burge, John M Carroll, Raymond McCall, and Ivan Mistrik. 2008. What is Rationale and Why Does It Matter? Rationale-Based Software Engineering (2008), 3--23.
[8]
Ralph D'agostino and Egon S Pearson. 1973. Tests for departure from normality. Biometrika 60, 3 (1973), 613--622.
[9]
Mouna Dhaouadi. 2023. Extraction and Management of Rationale. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 122, 3 pages.
[10]
Mouna Dhaouadi, Bentley James Oakes, and Michalis Famelis. 2023. End-to-End Rationale Reconstruction. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 176, 5 pages.
[11]
Mouna Dhaouadi, Bentley James Oakes, and Michalis Famelis. 2023. Towards Understanding and Analyzing Rationale in Commit Messages using a Knowledge Graph Approach. In 2023 International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C).
[12]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
[13]
Nicolas E Gold and Jens Krinke. 2020. Ethical mining: A case study on MSR mining challenges. In Proceedings of the 17th International Conference on Mining Software Repositories. 265--276.
[14]
Tom-Michael Hesse. 2020. Supporting software development by an integrated documentation model for decisions. Ph. D. Dissertation.
[15]
Jian Huang, Moinuddin K Qureshi, and Karsten Schwan. 2016. An evolutionary study of Linux memory management for fun and profit. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 465--478.
[16]
Anja Kleebaum, Barbara Paech, Jan Ole Johanssen, and Bernd Bruegge. 2021. Continuous Rationale Identification in Issue Tracking and Version Control Systems. Joint Proceedings of REFSQ-2021 Workshops, OpenRE, Posters and Tools Track, and Doctoral Symposium (2021).
[17]
Anja Kleebaum, Barbara Paech, Jan Ole Johanssen, and Bernd Bruegge. 2021. Continuous Rationale Visualization. In Working Conference on Software Visualization (VISSOFT). 33--43.
[18]
Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 806--817.
[19]
Xueying Li, Peng Liang, and Zengyang Li. 2020. Automatic identification of decisions from the hibernate developer mailing list. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering.
[20]
Yan Liang, Ying Liu, Chun Kit Kwong, and Wing Bun Lee. 2012. Learning the "Whys": Discovering Design Rationale Using Text Mining --- An Algorithm Perspective. Computer-Aided Design 44, 10 (Oct. 2012), 916--930.
[21]
Umme Ayda Mannan, Iftekhar Ahmed, Carlos Jensen, and Anita Sarma. 2020. On the relationship between design discussions and design quality: a case study of Apache projects. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 543--555.
[22]
Leann Myers and Maria J Sirois. 2004. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences 12 (2004).
[23]
Keyur Patel, João Faccin, Abdelwahab Hamou-Lhadj, and Ingrid Nunes. 2022. The Sense of Logging in the Linux Kernel. Empirical Software Engineering 27, 6 (Nov. 2022), 153.
[24]
Paul Ralph, Nauman bin Ali, Sebastian Baltes, Domenico Bianculli, Jessica Diaz, Yvonne Dittrich, Neil Ernst, Michael Felderer, Robert Feldt, Antonio Filieri, et al. 2020. Empirical standards for software engineering research. arXiv preprint arXiv:2010.03525 (2020).
[25]
Soumaya Rebai, Marouane Kessentini, Vahid Alizadeh, Oussama Ben Sghaier, and Rick Kazman. 2020. Recommending refactorings via commit message analysis. Information and Software Technology 126 (2020), 106332.
[26]
Benjamin Rogers, James Gung, Yechen Qiao, and Janet E. Burge. 2012. Exploring techniques for rationale extraction from existing documents. In 2012 34th International Conference on Software Engineering (ICSE). 1313--1316.
[27]
Pankajeshwara Nand Sharma, Bastin Tony Roy Savarimuthu, and Nigel Stanger. 2021. Extracting Rationale for Open Source Software Development Decisions --- A Study of Python Email Archives. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, Madrid, ES, 1008--1019.
[28]
Mohamed Soliman, Matthias Galster, and Matthias Riebisch. 2017. Developing an ontology for architecture knowledge from developer communities. In IEEE International Conference on Software Architecture (ICSA). IEEE, 89--92.
[29]
Diomidis Spinellis and Paris Avgeriou. 2021. Evolution of the Unix System Architecture: An Exploratory Case Study. IEEE Transactions on Software Engineering 47, 6 (June 2021), 1134--1163.
[30]
Harsh Suri. 2011. Purposeful sampling in qualitative research synthesis. Qualitative research journal 11, 2 (2011), 63--75.
[31]
Eugene Syriani, Istvan David, and Gauransh Kumar. 2023. Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews. arXiv preprint arXiv:2307.06464 (2023).
[32]
Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2022. A large-scale empirical study of commit message generation: models, datasets and evaluation. Empirical Software Engineering 27, 7 (2022), 198.
[33]
Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message?. In Proceedings of the 44th International Conference on Software Engineering. 2389--2401.
[34]
Bianca Trinkenreich, Klaas-Jan Stol, Anita Sarma, Daniel M German, Marco A Gerosa, and Igor Steinmacher. 2023. Do I belong? modeling sense of virtual community among Linux kernel contributors. arXiv:2301.06437 (2023).
[35]
Jan Salvador van der Ven and Jan Bosch. 2013. Making the Right Decision: Supporting Architects with Design Decision Data. In Software Architecture, David Hutchison et al. (Eds.). Vol. 7957. Springer Berlin Heidelberg, Berlin, Heidelberg, 176--183.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
April 2024
487 pages
ISBN:9798400705861
DOI:10.1145/3643916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024

Check for updates

Author Tags

  1. developer rationale
  2. dataset
  3. Linux kernel
  4. commit messages

Qualifiers

  • Research-article

Funding Sources

Conference

ICPC '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 46
    Total Downloads
  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media