[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-030-87007-2_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Published: 13 September 2021 Publication History

Abstract

Bug prediction is a resource demanding task that is hard to automate using static source code analysis. In many fields of computer science, machine learning has proven to be extremely useful in tasks like this, however, for it to work we need a way to use source code as input. We propose a simple, but meaningful representation for source code based on its abstract syntax tree and the Doc2Vec embedding algorithm. This representation maps the source code to a fixed length vector which can be used for various upstream tasks – one of which is bug prediction. We measured this approach’s validity by itself and its effectiveness compared to bug prediction based solely on code metrics. We also experimented on numerous machine learning approaches to check the connection between different embedding parameters with different machine learning models. Our results show that this representation provides meaningful information as it improves the bug prediction accuracy in most cases, and is always at least as good as only using code metrics as features.

References

[1]
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283 (2016)
[3]
Chen, Z., Monperrus, M.: A literature study of embeddings on source code (2019)
[4]
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories. MSR 2010, pp. 31–41 (2010)
[5]
DeFreez, D., Thakur, A.V., Rubio-González, C.: Path-based function embedding and its application to specification mining. CoRR, abs/1802.07779 (2018)
[6]
Devlin, J., Uesato, J., Singh, R., Kohli, P.: Semantic code repair using neuro-symbolic transformation networks. CoRR, abs/1710.11054 (2017)
[7]
Ferenc, R., Bán, D., Grósz, T., Gyimóthy, T.: Deep learning in static, metric-based bug prediction. Array, 6:100021. Open Access (2020a)
[8]
Ferenc, R., Tóth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for java and its assessment regarding metrics and bug prediction. Softw. Qual. J. 28, 1447–1506 (2020b). Open Access
[9]
Ferenc, R., Viszkok, T., Aladics, T., Jász, J., Hegedűs, P.: Deep-water framework: the Swiss army knife of humans working with machine learning models. SoftwareX 12, 100551 (2020c). Open Access
[10]
Hammouri A, Hammad M, Alnabhan M, and Alsarayrah F Software bug prediction using machine learning approach Int. J. Adv. Comput. Sci. Appl. 2018 9 2 78-83
[11]
Harer, J., et al.: Automated software vulnerability detection with machine learning (2018)
[12]
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, PROMISE 2010. Association for Computing Machinery, New York, NY, USA (2010)
[13]
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013a)
[14]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013b)
[15]
Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: learning distributed representations of graphs (2017)
[16]
Pan C, Lu M, Xu B, and Gao H An improved CNN model for within-project software defect prediction Appl. Sci. 2019 9 10 2138
[17]
Pedregosa F et al. Scikit-learn: machine learning in Python J. Mach. Learn. Res. 2011 12 2825-2830
[18]
Puranik, S., Deshpande, P., Chandrasekaran, K.: A novel machine learning approach for bug prediction. Procedia Comput. Sci. 93, 924–930 (2016). Proceedings of the 6th International Conference on Advances in Computing and Communications
[19]
Shippey T, Bowes D, and Hall T Automatically identifying code features for software defect prediction: using AST N-grams Inf. Softw. Technol. 2019 106 142-160
[20]
Tóth Z, Gyimesi P, Ferenc R, et al. Gervasi O et al. A public bug database of GitHub projects and its application in bug prediction Computational Science and Its Applications – ICCSA 2016 2016 Cham Springer 625-638
[21]
Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 297–308 (2016)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computational Science and Its Applications – ICCSA 2021: 21st International Conference, Cagliari, Italy, September 13–16, 2021, Proceedings, Part VII
Sep 2021
747 pages
ISBN:978-3-030-87006-5
DOI:10.1007/978-3-030-87007-2

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 September 2021

Author Tags

  1. Source code embedding
  2. Code metrics
  3. Bug prediction
  4. Java
  5. Doc2Vec

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media