A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility Study

153 Accesses
Explore all metrics

Abstract

Code smell detection has been primarily focused on homogeneous data. However, due to diverse sources of data, in a real-life scenario, the unseen target data on which code smell needs to be predicted may be heterogeneous in feature space representation from source data for which code smells are known. Also, the capability of a state-of-the-art technique of machine learning called “transfer learning” has not been well explored to transfer the knowledge of already known code smells from source data to predict code smells on unseen heterogeneous target data. This paper has examined the feasibility of transfer learning to predict code smells on unseen heterogeneous target data. The paper has proposed a novel method for detecting code smell on heterogeneous data using modified domain invariant transfer kernel learning (DITKL), one of the transfer learning techniques. The experiments were conducted using modified DITKL on six traditional machine learning models on long method and temporary field code smells. Results showed that modified DITKL on Naïve Bayes and the ID3 decision tree outperformed others for long method code smell, and modified DITKL on Multilayer Perceptron and Naïve Bayes performed well for temporary field code smell. The proposed method can be quite useful to detect code smells in unseen heterogeneous data when a tool or expert knowledge cannot be applied to detect a code smell due to characteristics of the unseen data. It can also help in establishing benchmark data for code smells.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Code smell detection using multi-label classification approach

Article 04 April 2020

Bad Smell Detection Using Machine Learning Techniques: A Systematic Literature Review

Article 07 January 2020

Severity Classification of Code Smells Using Machine-Learning Methods

Article 29 July 2023

References

Martin Fowler by, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code. 2002.
Sharma T, Spinellis D. A survey on software smells. J Syst Softw. 2018;138:158–73. https://doi.org/10.1016/j.jss.2017.12.034.
Article Google Scholar
Carvalho SG, Aniche M, Veríssimo J, Durelli RS, Gerosa MA. An empirical catalog of code smells for the presentation layer of android apps. Empir Softw Eng. 2019;24:3546–86. https://doi.org/10.1007/s10664-019-09768-9.
Article Google Scholar
Sharma T, Fragkoulis M, Rizou S, Bruntink M, Spinellis D. Smelly relations: Measuring and understanding database schema quality. Proc. - Int. Conf. Softw. Eng. IEEE Computer Society; 2018; p. 55–64. https://doi.org/10.1145/3183519.3183529.
Sharma T, Singh P, Spinellis D. An empirical investigation on the relationship between design and architecture smells. Empir Softw Eng. 2020;25:4020–68. https://doi.org/10.1007/s10664-020-09847-2.
Article Google Scholar
Marinescu R. Measurement and quality in object-oriented design. IEEE Int Conf Softw Maint ICSM. 2005;2005:701–4. https://doi.org/10.1109/ICSM.2005.63.
Article Google Scholar
Salehie M, Li S, Tahvildari L. A metric-based heuristic framework to detect object-oriented design flaws. IEEE Int Conf Progr Compr. 2006;2006:159–68. https://doi.org/10.1109/ICPC.2006.6.
Article Google Scholar
Vidal SA, Marcos C, Díaz-Pace JA. An approach to prioritize code smells for refactoring. Autom Softw Eng. 2016;23:501–32. https://doi.org/10.1007/s10515-014-0175-x.
Article Google Scholar
Moha N, Guéhéneuc YG, Duchien L, Le Meur AF. DECOR: A method for the specification and detection of code and design smells. IEEE Trans Softw Eng. 2010;36:20–36. https://doi.org/10.1109/TSE.2009.50.
Article MATH Google Scholar
Arnaoudova V, Di Penta M, Antoniol G, Guéhéneuc YG. A new family of software anti-patterns: linguistic anti-patterns. Proc Eur Conf Softw Maint Reeng CSMR. 2013. https://doi.org/10.1109/CSMR.2013.28.
Article Google Scholar
Tsantalis N, Chatzigeorgiou A. Identification of extract method refactoring opportunities for the decomposition of methods. J Syst Softw. 2011;84:1757–82. https://doi.org/10.1016/j.jss.2011.05.016.
Article Google Scholar
Sharma T, Mishra P, Tiwari R Designite—A software design quality assessment tool. Proc. - 1st Int. Work. Bringing Archit. Des. Think. Into Dev. Dly. Act. Bridg. 2016. Association for Computing Machinery, Inc; 2016; pp. 1–4. https://doi.org/10.1145/2896935.2896938.
Fu S, Shen B. Code bad smell detection through evolutionary data mining. Int Symp Empir Softw Eng Meas. 2015;2015:41–9. https://doi.org/10.1109/ESEM.2015.7321194.
Article Google Scholar
Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A. Mining version histories for detecting code smells. IEEE Trans Softw Eng. 2015;41:462–89. https://doi.org/10.1109/TSE.2014.2372760.
Article Google Scholar
Sahin D, Kessentini M, Bechikh S, Deb K. Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol. 2014. https://doi.org/10.1145/2675067.
Article Google Scholar
Ouni A, Kula RG, Kessentini M, Inoue K. Web service antipatterns detection using genetic programming. GECCO 2015—Proc. 2015 Genet. Evol. Comput. Conf. 2015; pp. 1351–8. https://doi.org/10.1145/2739480.2754724.
Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A. A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng. 2014;40:841–61. https://doi.org/10.1109/TSE.2014.2331057.
Article Google Scholar
Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H. BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw. 2011;84:559–72. https://doi.org/10.1016/j.jss.2010.11.921.
Article Google Scholar
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng. 2016;21:1143–91. https://doi.org/10.1007/s10664-015-9378-4.
Article Google Scholar
Arcelli Fontana F, Zanoni M. Code smell severity classification using machine learning techniques. Knowl Based Syst. 2017;128:43–58. https://doi.org/10.1016/j.knosys.2017.04.014.
Article Google Scholar
Maneerat N, Muenchaisri P. Bad-smell prediction from software design model using machine learning techniques. Proc 2011 8th Int Jt Conf Comput Sci Softw Eng JCSSE 2011. 2011;331–6. https://doi.org/10.1109/JCSSE.2011.5930143.
Wang X, Dang Y, Zhang L, Zhang D, Lan E, Mei H. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012. Proc 27th IEEE/ACM Int Conf Autom Softw Eng ASE 2012. 2012;170–9.
Montréal U De, Maiga A, Ali N, Aïmeur E. Support Vector Machines for Anti-pattern Detection Categories and Subject Descriptors. Proc 27th IEEE/ACM Int Conf Autom Softw Eng. 2012;278–81.
Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S. Classification model for code clones based on machine learning. Empir Softw Eng. 2015;20:1095–125. https://doi.org/10.1007/s10664-014-9316-x.
Article Google Scholar
Hassaine S, Khomh F, Guéhéneucy YG, Hamel S. IDS: An immune-inspired approach for the detection of software design smells. Proc—7th Int Conf Qual Inf commun technol QUATIC 2010. 2010;343–8. https://doi.org/10.1109/QUATIC.2010.61.
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E. SMURF: A SVM-based incremental anti-pattern detection approach. Proc—Work Conf Reverse Eng WCRE. 2012;466–75. https://doi.org/10.1109/WCRE.2012.56.
Khomh F, Vaucher S, Guéehéeneuc YG, Sahraoui H. A bayesian approach for the detection of code and design smells. Proc—Int Conf Qual Softw. 2009;305–14. https://doi.org/10.1109/QSIC.2009.47.
Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. 2015 IEEE 26th Int Symp Softw Reliab Eng ISSRE 2015. 2016;261–9. https://doi.org/10.1109/ISSRE.2015.7381819.
Fontana FA, Zanoni M, Marino A, Mäntylä MV. Code smell detection: Towards a machine learning-based approach. IEEE Int Conf Softw Maintenance, ICSM. 2013;396–9. https://doi.org/10.1109/ICSM.2013.56.
Kreimer J. Adaptive detection of design flaws. Electron Notes Theor Comput Sci. 2005;141:117–36. https://doi.org/10.1016/j.entcs.2005.02.059.
Article Google Scholar
White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. ASE 2016—Proc 31st IEEE/ACM Int Conf Autom Softw Eng. 2016;87–98. https://doi.org/10.1145/2970276.2970326.
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. Proc IEEE. 2021;109:43–76. https://doi.org/10.1109/JPROC.2020.3004555.
Article Google Scholar
Zuech R, Khoshgoftaar TM, Wald R. Intrusion detection and big heterogeneous data: a survey. J Big Data. 2015;2:3. https://doi.org/10.1186/s40537-015-0013-4.
Article Google Scholar
Lodha R, Jain H, Kurup L. Big data challenges : data analysis perspective. Int J Curr Eng Technol. 2014;4:3286–9.
Google Scholar
Arcelli Fontana Co-relatore F, Zanoni M. JCodeOdor: A software quality advisor through design flaws detection. 2012.
PMD. n.d. https://pmd.github.io/. Accessed 12 June 2023.
Fadul FM. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media. 2006. https://doi.org/10.1007/3-540-39538-5.
Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering. 2016. pp 1–12. https://doi.org/10.1145/2915970.2915984.
Kaur A, Dhiman G. A review on search-based tools and techniques to identify bad code smells in object-oriented systems. In: Harmony search and nature-inspired algorithms for engineering optimization theory application. ICHSA 2018, vol. 741. Springer Singapore; 2019. pp. 909–21. https://doi.org/10.1007/978-981-13-0761-4.
Olivas ES, Guerrero JDM, Martinez Sober M, Magdalena Benedito JR, Serrano López AJ. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. 2009. IGI global. https://doi.org/10.4018/978-1-60566-766-9.
Ramos M, Mello R De. On transfer learning in code smells detection. EasyChair; 2022 (Preprint).
De Stefano M, Pecorelli F, Palomba F, De Lucia A. Comparing within-and cross-project machine learning algorithms for code smell detection. MaLTESQuE 2021—Proc 5th Int Work Mach Learn Tech Softw Qual Evol Co-Located with ESEC/FSE 2021. 2021;1–6. https://doi.org/10.1145/3472674.3473978.
Krishna R, Menzies T. Bellwethers: a baseline method for transfer learning. IEEE Trans Softw Eng. 2019;45:1081–105. https://doi.org/10.1109/TSE.2018.2821670.
Article Google Scholar
Ardimento P, Aversano L, Bernardi ML, Cimitile M, Iammarino M. Transfer learning for just-in-time design smells prediction using temporal convolutional networks. Proc 16th Int Conf Softw Technol ICSOFT 2021. 2021;310–7. https://doi.org/10.5220/0010602203100317
Sharma T. On the feasibility of transfer-learning code smells using deep learning. ACM Trans Softw Eng Methodol. 2019;1:1281–4. https://doi.org/10.1145/nnnnnnn.nnnnnnn.
Article Google Scholar
Sharma T, Efstathiou V, Louridas P, Spinellis D. Code smell detection by deep direct-learning and transfer-learning. J Syst Softw. 2021;176:110936. https://doi.org/10.1016/j.jss.2021.110936.
Article Google Scholar
Long M, Wang J, Sun J, Yu PS. Domain invariant transfer kernel learning. IEEE Trans Knowl Data Eng. 2015;27:1519–32. https://doi.org/10.1109/TKDE.2014.2373376.
Article Google Scholar
AbuHassan A, Alshayeb M, Ghouti L. Software smell detection techniques: a systematic literature review. J Softw Evol Process. 2020. https://doi.org/10.1002/smr.2320.
Article Google Scholar
Caram FL, Rodrigues BRDO, Campanelli AS, Parreiras FS. Machine learning techniques for code smells detection: a systematic mapping study. Int J Softw Eng Knowl Eng. 2019;29:285–316. https://doi.org/10.1142/S021819401950013X.
Article Google Scholar
Azeem MI, Palomba F, Shi L, Wang Q. Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol. 2019;108:115–38. https://doi.org/10.1016/j.infsof.2018.12.009.
Article Google Scholar
Gupta R, Kumar SS. A novel metric based detection of temporary field code smell and its empirical analysis. J King Saud Univ Comput Inf Sci. 2022. https://doi.org/10.1016/j.jksuci.2021.11.005.
Article Google Scholar
Sharma T. DesigniteJava—Designite. n.d. https://www.designite-tools.com/designitejava/. Accessed 11 Feb 2021.
https://stackoverflow.com/. n.d.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22:1345–59. https://doi.org/10.1051/matecconf/201816401047.
Article Google Scholar
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big Data. 2016. https://doi.org/10.1186/s40537-016-0043-6.
Article Google Scholar
Nam J, Pan SJ, Kim S. Transfer defect learning. Proc—Int. Conf. Softw. Eng. 2013; p 382–91. https://doi.org/10.1109/ICSE.2013.6606584
Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54:248–56. https://doi.org/10.1016/j.infsof.2011.09.007.
Article Google Scholar
Day O, Khoshgoftaar TM. A survey on heterogeneous transfer learning. J Big Data. 2017. https://doi.org/10.1186/s40537-017-0089-0.
Article Google Scholar
featurewiz PyPI. n.d. https://pypi.org/project/featurewiz/. Accessed 2 June 2023.
Al-Shaaby A, Aljamaan H, Alshayeb M. Bad smell detection using machine learning techniques: a systematic literature review. Arab J Sci Eng. 2020. https://doi.org/10.1007/s13369-019-04311-w.
Article Google Scholar
Alkharabsheh K, Crespo Y, Manso E, Taboada JA. Software design smell detection: a systematic mapping study. Softw Qual J. 2018. https://doi.org/10.1007/s11219-018-9424-8.
Article Google Scholar
Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, et al. The Qualitas Corpus: a curated collection of Java code for empirical studies. Proc—Asia-Pacific Softw. Eng. Conf. APSEC. 2010; p 336–45. https://doi.org/10.1109/APSEC.2010.46.
Rasool G, Arshad Z. A lightweight approach for detection of code smells. Arab J Sci Eng. 2017;42:483–506. https://doi.org/10.1007/s13369-016-2238-8.
Article Google Scholar
Munro MJ. Product metrics for automatic identification of “bad smell” design problems in Java source-code. Proc Int Softw Metrics Symp. 2005;2005:125–33. https://doi.org/10.1109/METRICS.2005.38.
Article Google Scholar
Gupta R, Singh SK. TFfinder : A software tool to discover temporary field code smell. 2nd IEEE Int. Conf. Adv. Comput. Commun. Control Netw 2021.
Sriperumbudur BK, Fukumizu K, Lanckriet GRG. On the relation between universality, characteristic kernels and RKHS embedding of measures. J Mach Learn Res. 2010;9:773–80.
MATH Google Scholar
Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans Neural Networks. 2011;22:199–210. https://doi.org/10.1109/TNN.2010.2091281.
Article Google Scholar
Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ. Correcting sample selection bias by unlabeled data. Adv Neural Inf Process Syst. 2006;601–8.
Python Software Foundation. Welcome to Python.org. 2016. https://www.python.org/. Accessed 4 Aug 2021.
keras—Google Search. n.d. https://www.google.com/search?q=keras&ei=WFsKYdKILNDt9QOjgo3YDQ&oq=keras&gs_lcp=Cgdnd3Mtd2l6EAMyBwgAELEDEEMyBAgAEEMyBAgAEEMyBAgAEEMyBAgAEEMyBAgAEEMyBwgAELEDEEMyBAgAEEMyCAgAEIAEELEDMgQIABBDOgUIABCRAjoLCAAQgAQQsQMQgwE6DgguELEDEIMBEMcBENEDOggILhCxAxCDAToFCAAQgAQ6CAgAELEDEIMBOgUILhCABDoECC4QQzoRCC4QgAQQsQMQgwEQxwEQowI6BwguELEDEENKBAhBGABQ9_cvWKyBMGCshTBoAHACeACAAZwFiAGmFpIBCTItMS4wLjMuMpgBAKABAcABAQ&sclient=gws-wiz&ved=0ahUKEwjSlNrChJfyAhXQdn0KHSNBA9sQ4dUDCA4&uact=5. Accessed 4 Aug 2021.
Scikit Learn. sklearn.model_selection.GridSearchCV—scikit-learn 0.24.2 documentation. 2020. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html. Accessed 5 Aug 2021.
Field A. Discovering statistics using IBM SPSS statistics, vol. 58. Thousand Oaks: Sage; 2013.
Google Scholar
Halimu C, Kasem A, Newaz SHS. Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. PervasiveHealth Pervasive Comput Technol Healthc. 2019. https://doi.org/10.1145/3310986.3311023.
Article Google Scholar
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21:1–13. https://doi.org/10.1186/s12864-019-6413-7.
Article Google Scholar
Weiss KR, Khoshgoftaar TM. Analysis of transfer learning performance measures. Proc—2017 IEEE Int. Conf. Inf. Reuse Integr. IRI 2017, vol. 2017-Janua. 2017; pp. 338–45. https://doi.org/10.1109/IRI.2017.43.
Runeson P, Höst M, Rainer A, Regnell B. Case study research in software. Engineering. 2012. https://doi.org/10.1002/9781118181034.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, India
Ruchin Gupta & Sandeep Kumar Singh
Department of Information Technology, Kiet Group of Institutions, Ghaziabad, Delhi-NCR, India
Ruchin Gupta

Authors

Ruchin Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruchin Gupta.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical Approval

The experimental studies in the paper use datasets that do not contain any sensitive or private information because the main aim of this research is conceptual and methodological advancements.

Informed Consent

Not relevant. This research does not include any human subjects.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advanced Computing and Data Sciences” guest edited by Mayank Singh, Vipin Tyagi and P.K. Gupta.

Appendices

Appendix A

List of Abbreaviations

SI no.	Symbol	Description of the symbol
1	Ds	Source domain
2	D_T	Target domain
3	Xʹ	Target feature space
4	X	Source feature space
5	P(Xʹ)	Marginal probability distribution of target feature space
6	P(X)	Marginal probability distribution of source feature space
7	T_S	Source task
8	T_T	Target task
9	Y	Source label set
10	Y’	Target label set
11	P(Y\|X)	Conditional probability distribution of source
12	P(Yʹ\|Xʹ)	Conditional probability distribution of target
13	xi	ith instance of X
14	f (x)	Predictive function
15	x	x is an instance of X
16	y	y is a label of Y
17	DITKL	Domain Invariant Transfer Kernel Learning
18	MDITKL	Modified Domain Invariant Transfer Kernel Learning
19	SVM	Support vector machine
20	k	Kernel
21	ζ	Eigen-damping factor
22	AUC	Area under an ROC curve

Appendix B

Hyperparameters for traditional machine learning models

SI no.	Name of algorithm	Used modules and their parameters of python sklearn
1	Random forest	param_grid1 = [ {'n_estimators': [1,10,100,100]}] GridSearchCV(estimator = RandomForestClassifier(n_estimators = 100), param_grid = param_grid1, cv = 3, verbose = True, n_jobs = -1) RandomForestClassifier(best_c)
2	Decision tree (C4.5)	DecisionTreeClassifier()
3	Decision Tree (ID3)	Id3Estimator (gain_ratio = True, prune = True)
4	Naïve Bayes	GaussianNB()
5	Multilayer perceptron	param_grid1 = [ {'solver': ['lbfgs', 'sgd', 'adam']}] GridSearchCV(estimator = MLPClassifier(solver = 'lbfgs', alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1) MLPClassifier(solver = best_c, alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1)
6	KNeighborsClassifier	param_grid1 = [{'n_neighbors': [1,2,3,4,5,6]}] GridSearchCV(estimator = KNeighborsClassifier(), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1)

Appendix C

Snapshot of dataset 3

Snapshot of dataset 5

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, R., Singh, S.K. A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility Study. SN COMPUT. SCI. 4, 749 (2023). https://doi.org/10.1007/s42979-023-02157-6

Download citation

Received: 13 June 2023
Accepted: 18 July 2023
Published: 28 September 2023
DOI: https://doi.org/10.1007/s42979-023-02157-6