[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-030-87007-2_26guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Assessing Ensemble Learning Techniques in Bug Prediction

Published: 13 September 2021 Publication History

Abstract

The application of ensemble learning techniques is continuously increasing, since they have proven to be superior over traditional machine learning techniques in various domains. These algorithms could be employed for bug prediction purposes as well. Existing studies investigated the performance of ensemble learning techniques only for PROMISE and the NASA MDP public datasets; however, it is important to evaluate the ensemble learning techniques on additional public datasets in order to test the generalizability of the techniques. We investigated the performance of the two most widely-used ensemble learning techniques AdaBoost and Bagging on the Unified Bug Dataset, which encapsulates 3 class level public bug datasets in a uniformed format with a common set of software product metrics used as predictors. Additionally, we investigated the effect of using 3 different resampling techniques on the dataset. Finally, we studied the performance of using Decision Tree and Naïve Bayes as the weak learners in the ensemble learning. We also fine tuned the parameters of the weak learners to have the best possible end results.
We experienced that AdaBoost with Decision Tree weak learner outperformed other configurations. We could achieve 54.61% F-measure value (81.96% Accuracy, 50.92% Precision, 58.90% Recall) with the configuration of 300 estimators and 0.05 learning rate. Based on the needs, one can apply RUS resampling to get a recall value up to 75.14% (of course losing precision at the same time).

References

[1]
OpenStaticAnalyzer static code analyzer (2021). https://github.com/sed-inf-u-szeged/OpenStaticAnalyzer
[2]
Bejjanki, K.K., Gyani, J., Gugulothu, N.: Class imbalance reduction (CIR): a novel approach to software defect prediction in the presence of class imbalance. Symmetry 12(3) (2020). https://www.mdpi.com/2073-8994/12/3/407
[3]
Catal C Software fault prediction: a literature review and current trends Expert Syst. Appl. 2011 38 4 4626-4636
[4]
Chaturvedi, K., Bedi, P., Misra, S., Singh, V.: An empirical validation of the complexity of code changes and bugs in predicting the release time of open source software. In: 2013 IEEE 16th International Conference on Computational Science and Engineering, pp. 1201–1206 (2013).
[5]
Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP SMOTE: synthetic minority over-sampling technique J. Artif. Intell. Res. 2002 16 321-357
[6]
Compton, R., Frank, E., Patros, P., Koay, A.: Embedding java classes with code2vec: improvements from variable obfuscation. In: Proceedings of the 17th International Conference on Mining Software Repositories. MSR 2020. pp. 243–253. Association for Computing Machinery, New York (2020).
[7]
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41 (2010).
[8]
Ferenc, R., Bán, D., Grósz, T., Gyimóthy, T.: Deep learning in static, metric-based bug prediction. Array 6, 100021 (2020)., http://www.sciencedirect.com/science/article/pii/S2590005620300060
[9]
Ferenc, R., Siket, I., Hegedűs, P., Rajkó, R.: Employing partial least squares regression with discriminant analysis for bug prediction. arXiv e-prints arXiv:2011.01214 (2020)
[10]
Ferenc, R., Tóth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for java and its assessment regarding metrics and bug prediction. Softw. Qual. J. 28, 1447–1506 (2020).
[11]
Ferenc, R., Viszkok, T., Aladics, T., Jász, J., Hegedűs, P.: Deep-water framework: the swiss army knife of humans working with machine learning models. SoftwareX 12, 100551 (2020). https://www.sciencedirect.com/science/article/pii/S2352711019303772
[12]
Gao, Y., Yang, C.: Software defect prediction based on adaboost algorithm under imbalance distribution. In: 2016 4th International Conference on Sensors, Mechatronics and Automation (ICSMA 2016). Atlantis Press (2016)
[13]
Hasanin, T., Khoshgoftaar, T.: The effects of random undersampling with simulated class imbalance for big data. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 70–79. IEEE (2018)
[14]
Jiang Y, Cukic B, and Ma Y Techniques for evaluating fault prediction models Empirical Softw. Eng. 2008 13 5 561-595
[15]
Khan MZ Hybrid ensemble learning technique for software defect prediction Int. J. Mod. Educ. Comput. Sci. 2020 12 1 10
[16]
Kumari, M., Misra, A., Misra, S., Fernandez Sanz, L., Damasevicius, R., Singh, V.: Quantitative quality evaluation of software products by considering summary and comments entropy of a reported bug. Entropy 21(1) (2019). https://www.mdpi.com/1099-4300/21/1/91
[17]
Nevendra, M., Singh, P.: Software bug count prediction via AdaBoost.R-ET. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 7–12 (2019).
[18]
Peng Y, Kou G, Wang G, Wu W, and Shi Y Ensemble of software defect predictors: an AHP-based evaluation method Int. J. Inf. Technol. Decis. Making 2011 10 01 187-206
[19]
Petrić, J., Bowes, D., Hall, T., Christianson, B., Baddoo, N.: The jinx on the NASA software defect data sets. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–5 (2016)
[20]
Polikar, R.: Ensemble learning. In: Zhang, C., Ma, Y. (eds) Ensemble machine learning, pp. 1–34. Springer, Boston (2012).
[21]
Ren J, Qin K, Ma Y, and Luo G On software defect prediction using machine learning J. Appl. Math. 2014 2014 8
[22]
Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. school of information technology and engineering, University of Ottawa, Canada (2005). http://promise.site.uottawa.ca/SERepository
[23]
Schapire, R.E.: Explaining adaboost. In: Empirical inference, pp. 37–52. Springer, Heidelberg (2013).
[24]
Sharma, S., Kumar, S.: Analysis of ensemble models for aging related bug prediction in software systems. In: ICSOFT, pp. 290–297 (2018)
[25]
Singh, V.B., Misra, S., Sharma, M.: Bug severity assessment in cross project context and identifying training candidates. J. Inf. Knowl. Manage. 16(01), 1750005 (2017).
[26]
Tóth Z, Gyimesi P, Ferenc R, et al. Gervasi O et al. A public bug database of GitHub projects and its application in bug prediction Computational Science and Its Applications – ICCSA 2016 2016 Cham Springer 625-638
[27]
Wang S and Yao X Using class imbalance learning for software defect prediction IEEE Trans. Reliab. 2013 62 2 434-443
[28]
Yucalar F, Ozcift A, Borandag E, and Kilinc D Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability Eng. Sci. Technol. Int. J. 2020 23 4 938-950

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computational Science and Its Applications – ICCSA 2021: 21st International Conference, Cagliari, Italy, September 13–16, 2021, Proceedings, Part VII
Sep 2021
747 pages
ISBN:978-3-030-87006-5
DOI:10.1007/978-3-030-87007-2

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 September 2021

Author Tags

  1. AdaBoost
  2. Bug prediction
  3. Resampling
  4. Unified bug dataset

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media