Probabilistic Model Checking GitHub Repositories for Software Project Analysis
<p>Example of the branch network.</p> "> Figure 2
<p>DTMC modeling for branch network.</p> "> Figure 3
<p>DTMC model for Redline.</p> "> Figure 4
<p>Results of Query 3 and 4: Average number of cumulative actions and pull requests per week.</p> "> Figure 5
<p>Results of Query 5, 6 and 7: Average cumulative number by branch location.</p> "> Figure 6
<p>Result of Query 9: Probability of processing of pull requests in the next step.</p> "> Figure 7
<p>Result of Query 10: Change in pull request processing probability over time steps.</p> "> Figure 8
<p>Result of Query 11: Average time steps for participant appearance.</p> "> Figure 9
<p>Result of Query 12: Average number of commits on branch before pull request.</p> "> Figure 10
<p>Result of Query 13: Probability of making a commit over time steps.</p> "> Figure 11
<p>Result of Query 14: Average time steps to encounter another participant.</p> ">
Abstract
:1. Introduction
2. Background
2.1. Discrete-Time Markov Chain
- S represents the state space, which is the set of all possible states the system can be in.
- denotes the state transition probability matrix of the state space S.
- is a labeling function that maps each state to its respective labels.
- is a finite set of atomic propositions.
2.2. Probabilistic Computation Tree Logic
- State formula
- Path formula
- F operator
- State reward function
- State formula with reward
- Filter formula to find the average
- Return the value for the single state satisfying states
2.3. Model Checking
3. GitHub Model Checking
3.1. GitHub Workflow
- create: This action is employed to create a new branch, signifying the initiation of a separate branch for specific tasks such as implementing features or fixing bugs. This allows developers to work without impacting the main codebase.
- commit: Developers utilize the “commit” action to upload changes to the repository. Commits encapsulate sets of changes made to files, enabling version control and tracking modifications over time.
- pull request: Upon completing feature implementations or bug fixes, developers propose changes through a “pull request”. This action serves as the mechanism to merge changes into the main codebase or another branch.
- accepted: The “accepted” status indicates that a submitted pull request has been approved for integration.
- merge: Project maintainers or collaborators review and approve proposed changes through the “merge” action, signifying the integration of the changes into the main branch. This process consolidates new features or bug fixes.
- rejected: The “rejected” status denotes that a submitted pull request has not been approved for integration.
3.2. State Structure
- participant: {p1, …, pn}
- location: {master, branch}
- action: {create, commit, pullRequest, accepted, merge, rejected}
- <p1_branch_create> denotes p1 creating a branch.
- <p2_master_commit> represents p2 committing to the master branch.
- <p3_branch_pullRequest> indicates p3 initiating a pull request for a branch.
- <p4_branch_rejected> signifies that the pull request submitted by p4 has been rejected.
3.3. DTMC Modeling
3.4. Property Specification
3.5. Data Collection and Modeling
- oslabs-beta/DeSolver
- oslabs-beta/kafkaVision
- oslabs-beta/Redline
- oslabs-beta/Sherlogs
- oslabs-beta/Hearth
4. Experimental Results
4.1. Repository Characteristics
4.1.1. Action Rate
4.1.2. Activity per Week
4.2. Branch Management
4.2.1. Development Location
4.2.2. Pull Request Management
4.3. Participant Analysis
4.3.1. Individual Participants
4.3.2. Relationship Analysis
5. Discussion
5.1. Repository Characteristics
5.2. Branch Management
5.3. Participant Analysis
5.4. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jones, C. Software Project Management Practices: Failure versus Success. CrossTalk J. Def. Softw. Eng. 2004, 17, 5–9. [Google Scholar]
- Mandal, A.; Pal, S. Identifying the Reasons for Software Project Failure and Some of Their Proposed Remedial through BRIDGE Process Models. Int. J. Comput. Sci. Eng. 2015, 3, 118–126. [Google Scholar]
- Defranco, J.F.; Laplante, P.A. Review and Analysis of Software Development Team Communication Research. IEEE Trans. Prof. Commun. 2017, 60, 165–182. [Google Scholar] [CrossRef]
- Whitehead, J. Collaboration in Software Engineering: A Roadmap. In Future of Software Engineering (FOSE’07); IEEE: Piscataway, NJ, USA, 2007; pp. 214–225. [Google Scholar]
- Hahn, J.; Moon, J.Y.; Zhang, C. Emergence of New Project Teams from Open Source Software Developer Networks: Impact of Prior Collaboration Ties. Inf. Syst. Res. 2008, 19, 369–391. [Google Scholar] [CrossRef]
- Guveyi, E.; Aktas, M.S.; Kalipsiz, O. Human Factor on Software Quality: A Systematic Literature Review. In Proceedings of the Computational Science and Its Applications—ICCSA 2020: 20th International Conference, Cagliari, Italy, 1–4 July 2020; Proceedings, Part IV 20. Springer: Berlin/Heidelberg, Germany, 2020; pp. 918–930. [Google Scholar]
- Blischak, J.D.; Davenport, E.R.; Wilson, G. A Quick Introduction to Version Control with Git and GitHub. PLoS Comput. Biol. 2016, 12, e1004668. [Google Scholar] [CrossRef] [PubMed]
- Dabbish, L.; Stuart, C.; Tsay, J.; Herbsleb, J. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, 11–15 February 2012; pp. 1277–1286. [Google Scholar]
- Zhang, X.; Yu, Y.; Gousios, G.; Rastogi, A. Pull Request Decisions Explained: An Empirical Overview. IEEE Trans. Softw. Eng. 2022, 49, 849–871. [Google Scholar] [CrossRef]
- Anderson, D.K.; Merna, T. Project Management Strategy—Project Management Represented as a Process Based Set of Management Domains and the Consequences for Project Management Strategy. Int. J. Proj. Manag. 2003, 21, 387–393. [Google Scholar] [CrossRef]
- Lévárdy, V.; Browning, T.R. An Adaptive Process Model to Support Product Development Project Management. IEEE Trans. Eng. Manag. 2009, 56, 600–620. [Google Scholar] [CrossRef]
- Loeliger, J.; McCullough, M. Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2012. [Google Scholar]
- Privault, N. Discrete-Time Markov Chains. Understanding Markov Chains: Examples and Applications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 89–113. [Google Scholar]
- Ching, W.-K.; Ng, M.K.; Fung, E.S. Higher-Order Multivariate Markov Chains and Their Applications. Linear Algebra Its Appl. 2008, 428, 492–507. [Google Scholar] [CrossRef]
- Liu, T. Application of Markov Chains to Analyze and Predict the Time Series. Mod. Appl. Sci. 2010, 4, 162. [Google Scholar] [CrossRef]
- Ching, W.-K.; Ng, M.K. Markov Chains. Models, Algorithms and Applications; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Murray, G. Modelling Participation in Small Group Social Sequences with Markov Rewards Analysis. In Proceedings of the Second Workshop on NLP and Computational Social Science, Vancouver, BC, Canada, 3 August 2017; pp. 68–72. [Google Scholar]
- Andrei, O.; Murray, G. Interpreting Models of Social Group Interactions in Meetings with Probabilistic Model Checking. In Proceedings of the Group Interaction Frontiers in Technology; 2018; pp. 1–7. [Google Scholar]
- Gleirscher, M.; Calinescu, R.; Douthwaite, J.; Lesage, B.; Paterson, C.; Aitken, J.; Alexander, R.; Law, J. Verified Synthesis of Optimal Safety Controllers for human–robot Collaboration. Sci. Comput. Program. 2022, 218, 102809. [Google Scholar] [CrossRef]
- Zhao, Y.; Rozier, K.Y. Probabilistic Model Checking for Comparative Analysis of Automated Air Traffic Control Systems. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 3–6 November 2014; pp. 690–695. [Google Scholar]
- Konur, S. Real-Time and Probabilistic Temporal Logics: An Overview. arXiv 2010, arXiv:1005.3200. [Google Scholar]
- Ciesinski, F.; Größer, M. On Probabilistic Computation Tree Logic. In Validation of Stochastic Systems: A Guide to Current Research; Springer: Berlin/Heidelberg, Germany, 2004; pp. 147–188. [Google Scholar]
- Kwiatkowska, M.; Norman, G.; Parker, D. PRISM 4.0: Verification of Probabilistic Real-Time Systems. In Proceedings of the Computer Aided Verification: 23rd International Conference, CAV 2011, Snowbird, UT, USA, 14–20 July 2011; Proceedings 23. Springer: Berlin/Heidelberg, Germany, 2011; pp. 585–591. [Google Scholar]
- Haverkort, B.R.; Trivedi, K.S. Specification Techniques for Markov Reward Models. Discret. Event Dyn. Syst. 1993, 3, 219–247. [Google Scholar] [CrossRef]
- Kwiatkowska, M.; Norman, G.; Parker, D. Stochastic Model Checking. In Proceedings of the Formal Methods for Performance Evaluation: 7th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM 2007, Bertinoro, Italy, 28 May–2 June 2007; pp. 220–270. [Google Scholar]
- Kwiatkowska, M.; Norman, G.; Parker, D. Probabilistic Model Checking: Advances and Applications. In Formal System Verification: State-of the-Art and Future Trends; Springer: Berlin/Heidelberg, Germany, 2018; pp. 73–121. [Google Scholar]
- OSLabs. Available online: https://www.opensourcelabs.io/ (accessed on 24 November 2023).
- GitHub REST API Documentation—GitHub Docs. Available online: https://docs.github.com/en/rest?apiVersion=2022-11-28 (accessed on 24 November 2023).
- Zou, W.; Zhang, W.; Xia, X.; Holmes, R.; Chen, Z. Branch Use in Practice: A Large-Scale Empirical Study of 2923 Projects on Github. In Proceedings of the 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS), Sofia, Bulgaria, 22–26 July 2019; pp. 306–317. [Google Scholar]
- El Asri, I.; Kerzazi, N.; Benhiba, L.; Janati, M. From Periphery to Core: A Temporal Analysis of GitHub Contributors’ Collaboration Network. In Working Conference on Virtual Enterprises; Springer: Berlin/Heidelberg, Germany, 2017; pp. 217–229. [Google Scholar]
- Hu, Y.; Wang, S.; Ren, Y.; Choo, K.-K.R. User Influence Analysis for Github Developer Social Networks. Expert Syst. Appl. 2018, 108, 108–118. [Google Scholar] [CrossRef]
- Ait, A.; Izquierdo, J.L.C..; Cabot, J. An Empirical Study on the Survival Rate of GitHub Projects. In Proceedings of the 19th International Conference on Mining Software Repositories, Virtual, 18–20 May 2022; pp. 365–375. [Google Scholar]
- Beck, K. Extreme Programming Explained: Embrace Change; Addison-Wesley Professional: Boston, MA, USA, 2000. [Google Scholar]
- Albero Pomar, F.; Calvo-Manzano, J.A.; Caballero, E.; Arcilla-Cobián, M. Understanding Sprint Velocity Fluctuations for Improved Project Plans with Scrum: A Case Study. J. Softw. Evol. Process 2014, 26, 776–783. [Google Scholar] [CrossRef]
- Vasilescu, B.; Van Schuylenburg, S.; Wulms, J.; Serebrenik, A.; van den Brand, M.G. Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, 29 September–3 October 2014; pp. 401–405. [Google Scholar]
- Elazhary, O.; Storey, M.-A.; Ernst, N.; Zaidman, A. Do as i Do, Not as i Say: Do Contribution Guidelines Match the Github Contribution Process? In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), Cleveland, OH, USA, 30 September–4 October 2019; pp. 286–290. [Google Scholar]
- Zhou, M.; Mockus, A. What Make Long Term Contributors: Willingness and Opportunity in OSS Community. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 518–528. [Google Scholar]
- Zhou, M.; Mockus, A. Does the Initial Environment Impact the Future of Developers? In Proceedings of the 33rd International Conference on Software Engineering, Honolulu, HI, USA, 21–28 May 2011; pp. 271–280. [Google Scholar]
- Bao, L.; Xia, X.; Lo, D.; Murphy, G.C. A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects. IEEE Trans. Softw. Eng. 2019, 47, 1277–1298. [Google Scholar] [CrossRef]
- Eluri, V.K.; Mazzuchi, T.A.; Sarkani, S. Predicting Long-Time Contributors for GitHub Projects Using Machine Learning. Inf. Softw. Technol. 2021, 138, 106616. [Google Scholar] [CrossRef]
- Zöller, N.; Morgan, J.H.; Schröder, T. A Topology of Groups: What GitHub Can Tell Us about Online Collaboration. Technol. Forecast. Soc. Chang. 2020, 161, 120291. [Google Scholar] [CrossRef]
- Clarke, E.M.; Klieber, W.; Nováček, M.; Zuliani, P. Model Checking and the State Explosion Problem. In LASER Summer School on Software Engineering; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1–30. [Google Scholar]
Repository Characteristics | (1) | |
(2) | ] | |
(3) | ””] | |
(4) | ””] | |
ine | (5) | ” |
Branch Management | (6) | ” |
(7) | ” | |
(8) | ”””) | |
(9) | ”” ”) | |
(10) | ”””) | |
ine | (11) | ” |
Participant Analysis | (12) | ”} ”” |
(13) | ”] | |
(14) | ” | |
(15) | ”””) |
Repository | States | Transitions |
---|---|---|
DeSolver | 26 | 120 |
kafkaVision | 30 | 126 |
Redline | 31 | 90 |
Sherlogs | 25 | 99 |
Hearth | 29 | 75 |
Create | Commit | Pull Request | Accepted | Merge | Rejected | |
---|---|---|---|---|---|---|
DeSolver | 0.0962 | 0.5224 | 0.1122 | 0.0994 | 0.0994 | 0.0128 |
kafkaVision | 0.1266 | 0.4120 | 0.1524 | 0.1395 | 0.1395 | 0.0107 |
Redline | 0.0962 | 0.3471 | 0.1856 | 0.1684 | 0.1684 | 0.0172 |
Sherlogs | 0.0909 | 0.5584 | 0.1136 | 0.0877 | 0.0877 | 0.0227 |
Hearth | 0.1106 | 0.3230 | 0.1814 | 0.1814 | 0.1814 | 0.0000 |
Repository | Probability |
---|---|
DeSolver | 83.33% |
kafkaVision | 66.67% |
Redline | 100.00% |
Sherlogs | 91.67% |
Hearth | 100.00% |
Repository | Probability |
---|---|
DeSolver | 64.05% |
kafkaVision | 78.34% |
Redline | 93.30% |
Sherlogs | 30.80% |
Hearth | 68.20% |
p1 | p2 | p3 | p4 | |
---|---|---|---|---|
DeSolver | 12.50% | 30.00% | 50.00% | 42.86% |
kafkaVision | - | 58.33% | 8.33% | 26.67% |
Redline | - | 6.82% | - | 50.00% |
Sherlogs | 100% | 100% | 83.33% | 100% |
Hearth | 50.00% | 5.26% | 3.12% | 0.00% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jo, S.; Kwon, R.; Kwon, G. Probabilistic Model Checking GitHub Repositories for Software Project Analysis. Appl. Sci. 2024, 14, 1260. https://doi.org/10.3390/app14031260
Jo S, Kwon R, Kwon G. Probabilistic Model Checking GitHub Repositories for Software Project Analysis. Applied Sciences. 2024; 14(3):1260. https://doi.org/10.3390/app14031260
Chicago/Turabian StyleJo, Suhee, Ryeonggu Kwon, and Gihwon Kwon. 2024. "Probabilistic Model Checking GitHub Repositories for Software Project Analysis" Applied Sciences 14, no. 3: 1260. https://doi.org/10.3390/app14031260
APA StyleJo, S., Kwon, R., & Kwon, G. (2024). Probabilistic Model Checking GitHub Repositories for Software Project Analysis. Applied Sciences, 14(3), 1260. https://doi.org/10.3390/app14031260