Abstract
Background: Defining code smell is not a trivial task. Their recognition tends to be highly subjective. Nevertheless some code smells detection tools have been proposed. Other recent approaches incline towards machine learning (ML) techniques to overcome disadvantages of using automatic detection tools. Objectives: We aim to develop a research infrastructure and reproduce the process of code smell prediction proposed by Arcelli Fontana et al. We investigate ML algorithms performance for samples including major modern Java language features. Those such as lambdas can shorten the code causing code smell presence not as obvious to detect and thus pose a challenge to both existing code smell detection tools and ML algorithms. Method: We extend the study with dataset consisting of 281 Java projects. For driving samples selection we define metrics considering lambdas and method reference, derived using custom JavaParser-based solution. Tagged samples with new constructions are used as an input for the utilized detection techniques. Results: Detection rules derived from the best performing algorithms like J48 and JRip incorporate newly introduced metrics. Conclusions: Presence of certain new Java language constructs may hide Long Method code smell or indicate a God Class. On the other hand, their absence or low number can suggest a Data Class.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Datasets—sources.tar.gz for full datasets.
- 2.
https://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html, access: 2019-04-09.
- 3.
https://cran.r-project.org/web/packages/RWeka/index.html, access: 2019-04-09.
- 4.
http://topepo.github.io/caret/index.html, access: 2019-04-09.
- 5.
- 6.
https://javaparser.org, access: 2019-06-10.
- 7.
https://projectlombok.org, access: 2019-06-12.
References
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143–1191
Fontana FA, Mariani E, Mornioli A, Sormani R, Tonello A (2011) An experience report on using code smells detection tools. In: 2011 IEEE fourth international conference on software testing, verification and validation workshops, pp 450–457.https://doi.org/10.1109/ICSTW.2011.12
Fowler M (1999) Refactoring: improving the design of existing code. Addison-Wesley, Boston, MA, USA
Grodzicka H, Ziobrowski A, Łakomiak Z, Kawa M, Madeyski L (2019) Appendix to the paper “Code smell prediction employing machine learning meets emerging Java language constructs”. http://madeyski.e-informatyka.pl/download/GrodzickaEtAl19.pdf
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232. https://doi.org/10.1007/s00180-008-0119-7
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of computer science, National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers.html
Madeyski L, Kitchenham B (2019) Reproducer: reproduce statistical analyses and meta-analyses. http://madeyski.e-informatyka.pl/reproducible-research/. R package version 0.3.0 (http://CRAN.R-project.org/package=reproducer)
Palomba F (2015) Textual analysis for code smell detection. IEEE Int Conf Softw Eng 37(16):769–771
Palomba F, Bavota G, Penta MD, Oliveto R, Poshyvanyk D, Lucia AD (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489
Palomba F, Nucci DD, Tufano M, Bavota G, Oliveto R, Poshyvanyk D, De Lucia A (2015) Landfill: an open dataset of code smells with public evaluation. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15. IEEE Press, Piscataway, NJ, USA, pp 482–485
Sharma T (2017) Designite: a customizable tool for smell mining in c# repositories. SATToSE41
Tempero E, Anslow C, Dietrich J, Han T, Li, J, Lumpe M, Melton H, Noble J (2010) Qualitas corpus: a curated collection of java code for empirical studies. In: 2010 Asia pacific software engineering conference (APSEC2010), pp 336–345. https://doi.org/10.1109/APSEC.2010.46
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Acknowledgements
This work has been conducted as a part of research and development project POIR.01.01.01-00-0792/16 supported by the National Centre for Research and Development (NCBiR). We would like to thank Tomasz Lewowski, Tomasz Korzeniowski, Marek Skrajnowski and the entire team from code quest sp. z o.o. for tagging code smells and for all of the comments and feedback from the real-world software engineering environment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Grodzicka, H., Ziobrowski, A., Łakomiak, Z., Kawa, M., Madeyski, L. (2020). Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs. In: Poniszewska-Marańda, A., Kryvinska, N., Jarząbek, S., Madeyski, L. (eds) Data-Centric Business and Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-030-34706-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-34706-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34705-5
Online ISBN: 978-3-030-34706-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)