More Web Proxy on the site http://driver.im/

Article

Assessing Ensemble Learning Techniques in Bug Prediction

Authors:

Zsolt János Szamosvölgyi,

Endre Tamás Váradi,

Rudolf FerencAuthors Info & Claims

Computational Science and Its Applications – ICCSA 2021: 21st International Conference, Cagliari, Italy, September 13–16, 2021, Proceedings, Part VII

Pages 368 - 381

https://doi.org/10.1007/978-3-030-87007-2_26

Published: 13 September 2021 Publication History

Abstract

The application of ensemble learning techniques is continuously increasing, since they have proven to be superior over traditional machine learning techniques in various domains. These algorithms could be employed for bug prediction purposes as well. Existing studies investigated the performance of ensemble learning techniques only for PROMISE and the NASA MDP public datasets; however, it is important to evaluate the ensemble learning techniques on additional public datasets in order to test the generalizability of the techniques. We investigated the performance of the two most widely-used ensemble learning techniques AdaBoost and Bagging on the Unified Bug Dataset, which encapsulates 3 class level public bug datasets in a uniformed format with a common set of software product metrics used as predictors. Additionally, we investigated the effect of using 3 different resampling techniques on the dataset. Finally, we studied the performance of using Decision Tree and Naïve Bayes as the weak learners in the ensemble learning. We also fine tuned the parameters of the weak learners to have the best possible end results.

We experienced that AdaBoost with Decision Tree weak learner outperformed other configurations. We could achieve 54.61% F-measure value (81.96% Accuracy, 50.92% Precision, 58.90% Recall) with the configuration of 300 estimators and 0.05 learning rate. Based on the needs, one can apply RUS resampling to get a recall value up to 75.14% (of course losing precision at the same time).

References

[1]

OpenStaticAnalyzer static code analyzer (2021). https://github.com/sed-inf-u-szeged/OpenStaticAnalyzer

[2]

Bejjanki, K.K., Gyani, J., Gugulothu, N.: Class imbalance reduction (CIR): a novel approach to software defect prediction in the presence of class imbalance. Symmetry 12(3) (2020). https://www.mdpi.com/2073-8994/12/3/407

[3]

Catal C Software fault prediction: a literature review and current trends Expert Syst. Appl. 2011 38 4 4626-4636

[4]

Chaturvedi, K., Bedi, P., Misra, S., Singh, V.: An empirical validation of the complexity of code changes and bugs in predicting the release time of open source software. In: 2013 IEEE 16th International Conference on Computational Science and Engineering, pp. 1201–1206 (2013).

[5]

Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP SMOTE: synthetic minority over-sampling technique J. Artif. Intell. Res. 2002 16 321-357

[6]

Compton, R., Frank, E., Patros, P., Koay, A.: Embedding java classes with code2vec: improvements from variable obfuscation. In: Proceedings of the 17th International Conference on Mining Software Repositories. MSR 2020. pp. 243–253. Association for Computing Machinery, New York (2020).

[7]

D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41 (2010).

[8]

Ferenc, R., Bán, D., Grósz, T., Gyimóthy, T.: Deep learning in static, metric-based bug prediction. Array 6, 100021 (2020)., http://www.sciencedirect.com/science/article/pii/S2590005620300060

[9]

Ferenc, R., Siket, I., Hegedűs, P., Rajkó, R.: Employing partial least squares regression with discriminant analysis for bug prediction. arXiv e-prints arXiv:2011.01214 (2020)

[10]

Ferenc, R., Tóth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for java and its assessment regarding metrics and bug prediction. Softw. Qual. J. 28, 1447–1506 (2020).

[11]

Ferenc, R., Viszkok, T., Aladics, T., Jász, J., Hegedűs, P.: Deep-water framework: the swiss army knife of humans working with machine learning models. SoftwareX 12, 100551 (2020). https://www.sciencedirect.com/science/article/pii/S2352711019303772

[12]

Gao, Y., Yang, C.: Software defect prediction based on adaboost algorithm under imbalance distribution. In: 2016 4th International Conference on Sensors, Mechatronics and Automation (ICSMA 2016). Atlantis Press (2016)

[13]

Hasanin, T., Khoshgoftaar, T.: The effects of random undersampling with simulated class imbalance for big data. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 70–79. IEEE (2018)

[14]

Jiang Y, Cukic B, and Ma Y Techniques for evaluating fault prediction models Empirical Softw. Eng. 2008 13 5 561-595

[15]

Khan MZ Hybrid ensemble learning technique for software defect prediction Int. J. Mod. Educ. Comput. Sci. 2020 12 1 10

[16]

Kumari, M., Misra, A., Misra, S., Fernandez Sanz, L., Damasevicius, R., Singh, V.: Quantitative quality evaluation of software products by considering summary and comments entropy of a reported bug. Entropy 21(1) (2019). https://www.mdpi.com/1099-4300/21/1/91

[17]

Nevendra, M., Singh, P.: Software bug count prediction via AdaBoost.R-ET. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 7–12 (2019).

[18]

Peng Y, Kou G, Wang G, Wu W, and Shi Y Ensemble of software defect predictors: an AHP-based evaluation method Int. J. Inf. Technol. Decis. Making 2011 10 01 187-206

[19]

Petrić, J., Bowes, D., Hall, T., Christianson, B., Baddoo, N.: The jinx on the NASA software defect data sets. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–5 (2016)

[20]

Polikar, R.: Ensemble learning. In: Zhang, C., Ma, Y. (eds) Ensemble machine learning, pp. 1–34. Springer, Boston (2012).

[21]

Ren J, Qin K, Ma Y, and Luo G On software defect prediction using machine learning J. Appl. Math. 2014 2014 8

[22]

Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. school of information technology and engineering, University of Ottawa, Canada (2005). http://promise.site.uottawa.ca/SERepository

[23]

Schapire, R.E.: Explaining adaboost. In: Empirical inference, pp. 37–52. Springer, Heidelberg (2013).

[24]

Sharma, S., Kumar, S.: Analysis of ensemble models for aging related bug prediction in software systems. In: ICSOFT, pp. 290–297 (2018)

[25]

Singh, V.B., Misra, S., Sharma, M.: Bug severity assessment in cross project context and identifying training candidates. J. Inf. Knowl. Manage. 16(01), 1750005 (2017).

[26]

Tóth Z, Gyimesi P, Ferenc R, et al. Gervasi O et al. A public bug database of GitHub projects and its application in bug prediction Computational Science and Its Applications – ICCSA 2016 2016 Cham Springer 625-638

[27]

Wang S and Yao X Using class imbalance learning for software defect prediction IEEE Trans. Reliab. 2013 62 2 434-443

[28]

Yucalar F, Ozcift A, Borandag E, and Kilinc D Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability Eng. Sci. Technol. Int. J. 2020 23 4 938-950

Index Terms

Assessing Ensemble Learning Techniques in Bug Prediction

Index terms have been assigned to the content through auto-classification.

Recommendations

Reducing Features to Improve Bug Prediction
ASE '09: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering

Recently, machine learning classifiers have emerged as a way to predict the existence of a bug in a change made to a source code file. The classifier is first trained on software history data, and then used to predict bugs. Two drawbacks of existing ...
Evaluating the adaptive selection of classifiers for cross-project bug prediction
RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

Bug prediction models are used to locate source code elements more likely to be defective. One of the key factors influencing their performances is related to the selection of a machine learning method (a.k.a., classifier) to use when discriminating ...
Research on Ensemble Learning
AICI '09: Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence - Volume 03

Ensemble learning is a powerful machine learning paradigm which has exhibited apparent advantages in many applications. An ensemble in the context of machine learning can be broadly defined as a machine learning system that is constructed with a set of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computational Science and Its Applications – ICCSA 2021: 21st International Conference, Cagliari, Italy, September 13–16, 2021, Proceedings, Part VII

Sep 2021

747 pages

ISBN:978-3-030-87006-5

DOI:10.1007/978-3-030-87007-2

Editors:
Osvaldo Gervasi
University of Perugia, Perugia, Italy
,
Beniamino Murgante
University of Basilicata, Potenza, Potenza, Italy
,
Sanjay Misra
Covenant University, Ota, Nigeria
,
Chiara Garau
University of Cagliari, Cagliari, Italy
,
Ivan Blečić
University of Cagliari, Cagliari, Italy
,
David Taniar
Monash University, Clayton, VIC, Australia
,
Bernady O. Apduhan
Kyushu Sangyo University, Fukuoka, Japan
,
Ana Maria A. C. Rocha
University of Minho, Braga, Portugal
,
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
,
Carmelo Maria Torre
Polytechnic University of Bari, Bari, Italy

© Springer Nature Switzerland AG 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 September 2021

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents