Abstract
As information technology continues to advance, software applications are becoming increasingly critical. However, the growing size and complexity of software development can lead to serious flaws resulting in significant financial losses. To address this issue, Software Defect Prediction (SDP) technology is being developed to detect and resolve defects early in the software development process, ensuring high software quality. As a result, SDP research has become a major focus for academics worldwide. This study aims to compare various machine learning-based SDP algorithm models and determine if traditional machine learning algorithms affect SDP outcomes. Unlike previous studies that aimed to identify the best prediction model for all datasets, this paper constructs SDP superiority models separately for different datasets. Using the publicly available ESEM2016 dataset, 13 machine learning classification algorithms are employed to predict software defects. Evaluation indicators such as Accuracy, AUC(Area Under the Curve), F-measure, and Running Time(RT) are utilized to assess the performance of the classification algorithms. Due to the serious class imbalance problem in this dataset, 10 sampling methods are combined with the 13 machine learning algorithms to explore the effect of sampling techniques on the performance of traditional machine learning classification models. Finally, a comprehensive evaluation is conducted to identify the best combination of sampling techniques and classification models to construct the final dominant model for SDP.
Similar content being viewed by others
Data availability
The dataset used in this research is openly accessible via: https://github.com/tjshippey/ESEM2016.
References
Ali, A., Khan, N., Abu-Tair, M., Noppen, J., McClean, S., & McChesney, I. (2021). Discriminating features-based cost-sensitive approach for software defect prediction. Automated Software Engineering, 28, 11. https://doi.org/10.1007/s10515-021-00289-8
Andersson, C., & Runeson, P. (2007). A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems. IEEE Transactions on Software Engineering, 33(5), 273–286. https://doi.org/10.1109/TSE.2007.1005
Bakir, B., Batmaz, I., Gunturkun, F., Ipekci, I. A., Koksal, G., & Ozdemirel, N. E. (2008). Defect cause modeling with decision tree and regression analysis. International Journal of Industrial and Manufacturing Engineering, 2(12), 1334–1337.
Batool, I., & Khan, T. A. (2023). Software fault prediction using deep learning techniques. Software Quality Journal. https://doi.org/10.1007/s11219-023-09642-4
Bennin, K. E., Keung, J. W., & Monden, A. (2019). On the relative value of data resampling approaches for software defect prediction. Empirical Software Engineering, 24, 602–636. https://doi.org/10.1007/s10664-018-9633-6
Bhargava, N., Sharma, G., Bhargava, R., & Mathuria, M. (2013). Decision tree analysis on j48 algorithm for data mining. Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, 3(6).
Bhat, N. A., & Farooq, S. U. (2023). An empirical evaluation of defect prediction approaches in within-project and cross-project context. Software Quality Journal, 31, 917–946. https://doi.org/10.1007/s11219-023-09615-7
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery And Data Mining(pp. 785-794). ACM, San Francisco California USA. https://doi.org/10.1145/2939672.2939785
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Cox, D. R. (1958). The Regression Analysis of Binary Sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
Dhall, S., & Chug, A. (2013). Software Defect Prediction Using Supervised Learning Algorithm and Unsupervised Learning Algorithm. In Confluence 2013: The Next Generation Information Technology Summit (4th International Conference) (pp. 5-5). Institution of Engineering and Technology, Noida, India. https://doi.org/10.1049/cp.2013.2313
Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering, 2(1), 602–609. https://doi.org/10.1080/21642583.2014.956265
Felix, E. A., & Lee, S. P. (2019). Systematic literature review of preprocessing techniques for imbalanced data. IET Software, 13(6), 479–496. https://doi.org/10.1049/iet-sen.2018.5193
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
Ganguly, K. K., & Mainul Hossain, B. M. (2018). Evaluating the Effectiveness of Conventional Machine Learning Techniques for Defect Prediction: A Comparative Study. In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 481-485). IEEE, Kitakyushu, Japan. https://doi.org/10.1109/ICIEV.2018.8641006
Ge, J., Liu, J., & Liu, W. (2018). Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets. In 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 399-406). IEEE, Busan. https://doi.org/10.1109/SNPD.2018.8441143
Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537. https://doi.org/10.1016/j.jss.2022.111537
Gong, L., Jiang, S., & Jiang, L. (2019). Research progress of software defect prediction technology. Journal of Software, 30, 3090–3114. https://doi.org/10.13328/j.cnki.jos.005790
Hart, P. (1968). The condensed nearest neighbor rule (corresp.). IEEE Transactions on Information Theory, 14(3), 515–516.
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 1322-1328). IEEE, Hong Kong, China. https://doi.org/10.1109/IJCNN.2008.4633969
Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., Sana, L., Ahmad, M., & Husen, A. (2019). Performance analysis of machine learning techniques on software defect prediction using nasa datasets. International Journal of Advanced Computer Science and Applications, 10(5), 300–308. https://doi.org/10.14569/IJACSA.2019.0100538
Jiang, Y., Cukic, B., & Ma, Y. (2008). Techniques for evaluating fault prediction models. Empirical Software Engineering, 13, 561–595. https://doi.org/10.1007/s10664-008-9079-3
Kakkar, M., & Jain, S. (2016). Feature selection in software defect prediction: A comparative study. In 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) (pp. 658-663). IEEE, Noida, India. https://doi.org/10.1109/CONFLUENCE.2016.7508200
Kamei, Y., Shihab, E., Adams, B., Hassan, A. E., Mockus, A., Sinha, A., & Ubayashi, N. (2013). A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 39(6), 757–773. https://doi.org/10.1109/TSE.2012.70
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA.
Khan, B., Naseem, R., Shah, M. A., Wakil, K., Khan, A., Uddin, M. I., & Mahmoud, M. (2021). Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques. Journal of Healthcare Engineering, 2021, 1–16. https://doi.org/10.1155/2021/8899263
Kondo, M., German, D. M., Mizuno, O., & Choi, E.-H. (2020). The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering, 25(1), 890–939. https://doi.org/10.1007/s10664-019-09736-3
Liu, Y., Cheah, W. P., Kim, B.-K., & Park, H. (2008). Predict software failure prone by learning bayesian network. International Journal of Advanced Science and Technology, 1(1), 35–42.
Liu, Y., Zhang, W., Qin, G., & Zhao, J. (2022). A comparative study on the effect of data imbalance on software defect prediction. Procedia Computer Science, 214, 1603–1616. https://doi.org/10.1016/j.procs.2022.11.349
Li, Y., & Wu, H. (2012). A clustering method based on k-means algorithm. Physics Procedia, 25, 1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
Li, Z., Wu, Y., H., W., Chen, X., & Liu, Y. (2022). A survey of software multiple defect localization methods. Journal of Computer Science, 45(2), 256–288.
Malhotra, R. (2015). A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing, 27, 504–518. https://doi.org/10.1016/j.asoc.2014.11.023
Mizuno, O., & Hata, H. (2010). An Integrated Approach to Detect Fault-Prone Modules Using Complexity and Text Feature Metrics. In Advances in Computer Science and Information Technology, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13577-4_41
Morasca, S., & Lavazza, L. (2020). On the assessment of software defect prediction models via roc curves. Empirical Software Engineering, 25, 3977–4019. https://doi.org/10.1007/s10664-020-09861-4
Mori, T., & Uchihira, N. (2019). Balancing the trade-off between accuracy and interpretability in software defect prediction. Empirical Software Engineering, 24(2), 779–825. https://doi.org/10.1007/s10664-018-9638-1
Nagappan, N., Ball, T., & Zeller, A. (2006). Mining metrics to predict component failures. In Proceedings of the 28th International Conference on Software Engineering (pp. 452-461). ACM, Shanghai China. https://doi.org/10.1145/1134285.1134349
Okutan, A., & Yildiz, O. T. (2014). Software defect prediction using Bayesian networks. Empirical Software Engineering, 19(1), 154–181. https://doi.org/10.1007/s10664-012-9218-8
Ozakinci, R., & Kolukisa Tarhan, A. (2023). A decision analysis approach for selecting software defect prediction method in the early phases. Software Quality Journal, 31, 121–177. https://doi.org/10.1007/s11219-022-09595-0
Parashar, A., Kumar Goyal, R., Kaushal, S., & Kumar Sahana, S. (2022). Machine learning approach for software defect prediction using multi-core parallel computing. Automated Software Engineering, 29, 44. https://doi.org/10.1007/s10515-022-00340-2
Pelayo, L., & Dick, S. (2007). Applying Novel Resampling Strategies To Software Defect Prediction. In NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society (pp. 69-72). IEEE, SanDiego, CA, USA. https://doi.org/10.1109/NAFIPS.2007.383813
Prati, R.C., Batista, G. E. A. P. A., & Monard, M. C. (2004). Learning with Class Skews and Small Disjuncts. In Advances in Artificial Intelligence - SBIA 2004 (pp. 296-306). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-28645-5_30
Rath, S. K., Sahu, M., Das, S. P., Bisoy, S. K., & Sain, M. (2022). A Comparative Analysis of SVM and ELM Classification on Software Reliability Prediction Model. Electronics, 11(17), 2707. https://doi.org/10.3390/electronics11172707
Ridgeway, G., Madigan, D., Richardson, T., & O’Kane, J. (1998). Interpretable Boosted Naive Bayes Classification. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 101-104). AAAI Press.
Shippey, T., Hall, T., Counsell, S., & Bowes, D. (2016). So you need more method level datasets for your software defect prediction? voila! Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, (pp. 1–6). https://doi.org/10.1145/2961111.2962620
Singh, H., & Kaur, K. (2013). New Method for Finding Initial Cluster Centroids in K-means Algorithm. International Journal of Computer Applications, 74(6), 27–30. https://doi.org/10.5120/12890-9837
Song, Q., Guo, Y., & Shepperd, M. (2019). A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction. IEEE Transactions on Software Engineering, 45(12), 1253–1269. https://doi.org/10.1109/TSE.2018.2836442
Stradowski, S., & Madeyski, L. (2023). Bridging the Gap Between Academia and Industry in Machine Learning Software Defect Prediction: Thirteen Considerations. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 1098-1110). IEEE, Luxembourg, Luxembourg. https://doi.org/10.1109/ASE56229.2023.00026
Suma, V., Pushphavathi, T. P., & Ramaswamy, V. (2014). An Approach to Predict Software Project Success Based on Random Forest Classifier. Advances in Intelligent Systems and Computing, 249, 329–336. https://doi.org/10.1007/978-3-319-03095-1_36
Tomek, I. (1976). Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics,6, 769–772. https://doi.org/10.1109/TSMC.1976.4309452
Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 62(2), 434–443. https://doi.org/10.1109/TR.2013.2259203
Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 2, 408–421. https://doi.org/10.1109/TSMC.1972.4309137
Xia, X., Shihab, E., Kamei, Y., Lo, D., & Wang, X. (2016). Predicting Crashing Releases of Mobile Applications. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering And Measurement (pp. 1-10). ACM, Ciudad Real Spain. https://doi.org/10.1145/2961111.2962606
Yan, Z., Chen, X., & Guo, P. (2010). Software Defect Prediction Using Fuzzy Support Vector Regression. In Advances in Neural Networks (pp. 17-24). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_3
Zhang, W., Yan, S., Li, J., Tian, X., & Yoshida, T. (2022). Credit risk prediction of SMEs in supply chain finance by fusing demographic and behavioral data. Transportation Research Part E: Logistics and Transportation Review, 158, 102611. https://doi.org/10.1016/j.tre.2022.102611
Acknowledgements
The authors express great thanks to the financial support from the National Natural Science Foundation of China, the Department of Science and Technology of Henan Province, and the Zhengzhou University of Light Industry.
Funding
This work was financially supported by the National Natural Science Foundation of China (61906175), the Doctoral Research Fund of Zhengzhou University of Light Industry (2020BSJJ067), the Science and Technology Project of Henan Province (222102210096, 232102210014, 242102210033, 242102211050), and the Henan Province Higher Education Teaching Reform Research and Practice Project (2021SJGLX292).
Author information
Authors and Affiliations
Contributions
Hongwei Tao: conceptualization, methodology, writing original draft, validation, supervision. Xiaoxu Niu: data curation, methodology, writing original draft, software, investigation, validation. Lang Xu: data curation, investigation. Lianyou Fu: writing–review and editing. Qiaoling Cao: writing–review and editing. Haoran Chen: writing–review and editing. Songtao Shang: writing–review and editing. Yang Xian: writing–review and editing.
Corresponding author
Ethics declarations
Competing interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tao, H., Niu, X., Xu, L. et al. A comparative study of software defect binomial classification prediction models based on machine learning. Software Qual J 32, 1203–1237 (2024). https://doi.org/10.1007/s11219-024-09683-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-024-09683-3