Abstract
Code smell detection has been primarily focused on homogeneous data. However, due to diverse sources of data, in a real-life scenario, the unseen target data on which code smell needs to be predicted may be heterogeneous in feature space representation from source data for which code smells are known. Also, the capability of a state-of-the-art technique of machine learning called “transfer learning” has not been well explored to transfer the knowledge of already known code smells from source data to predict code smells on unseen heterogeneous target data. This paper has examined the feasibility of transfer learning to predict code smells on unseen heterogeneous target data. The paper has proposed a novel method for detecting code smell on heterogeneous data using modified domain invariant transfer kernel learning (DITKL), one of the transfer learning techniques. The experiments were conducted using modified DITKL on six traditional machine learning models on long method and temporary field code smells. Results showed that modified DITKL on Naïve Bayes and the ID3 decision tree outperformed others for long method code smell, and modified DITKL on Multilayer Perceptron and Naïve Bayes performed well for temporary field code smell. The proposed method can be quite useful to detect code smells in unseen heterogeneous data when a tool or expert knowledge cannot be applied to detect a code smell due to characteristics of the unseen data. It can also help in establishing benchmark data for code smells.
Similar content being viewed by others
References
Martin Fowler by, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code. 2002.
Sharma T, Spinellis D. A survey on software smells. J Syst Softw. 2018;138:158–73. https://doi.org/10.1016/j.jss.2017.12.034.
Carvalho SG, Aniche M, Veríssimo J, Durelli RS, Gerosa MA. An empirical catalog of code smells for the presentation layer of android apps. Empir Softw Eng. 2019;24:3546–86. https://doi.org/10.1007/s10664-019-09768-9.
Sharma T, Fragkoulis M, Rizou S, Bruntink M, Spinellis D. Smelly relations: Measuring and understanding database schema quality. Proc. - Int. Conf. Softw. Eng. IEEE Computer Society; 2018; p. 55–64. https://doi.org/10.1145/3183519.3183529.
Sharma T, Singh P, Spinellis D. An empirical investigation on the relationship between design and architecture smells. Empir Softw Eng. 2020;25:4020–68. https://doi.org/10.1007/s10664-020-09847-2.
Marinescu R. Measurement and quality in object-oriented design. IEEE Int Conf Softw Maint ICSM. 2005;2005:701–4. https://doi.org/10.1109/ICSM.2005.63.
Salehie M, Li S, Tahvildari L. A metric-based heuristic framework to detect object-oriented design flaws. IEEE Int Conf Progr Compr. 2006;2006:159–68. https://doi.org/10.1109/ICPC.2006.6.
Vidal SA, Marcos C, Díaz-Pace JA. An approach to prioritize code smells for refactoring. Autom Softw Eng. 2016;23:501–32. https://doi.org/10.1007/s10515-014-0175-x.
Moha N, Guéhéneuc YG, Duchien L, Le Meur AF. DECOR: A method for the specification and detection of code and design smells. IEEE Trans Softw Eng. 2010;36:20–36. https://doi.org/10.1109/TSE.2009.50.
Arnaoudova V, Di Penta M, Antoniol G, Guéhéneuc YG. A new family of software anti-patterns: linguistic anti-patterns. Proc Eur Conf Softw Maint Reeng CSMR. 2013. https://doi.org/10.1109/CSMR.2013.28.
Tsantalis N, Chatzigeorgiou A. Identification of extract method refactoring opportunities for the decomposition of methods. J Syst Softw. 2011;84:1757–82. https://doi.org/10.1016/j.jss.2011.05.016.
Sharma T, Mishra P, Tiwari R Designite—A software design quality assessment tool. Proc. - 1st Int. Work. Bringing Archit. Des. Think. Into Dev. Dly. Act. Bridg. 2016. Association for Computing Machinery, Inc; 2016; pp. 1–4. https://doi.org/10.1145/2896935.2896938.
Fu S, Shen B. Code bad smell detection through evolutionary data mining. Int Symp Empir Softw Eng Meas. 2015;2015:41–9. https://doi.org/10.1109/ESEM.2015.7321194.
Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A. Mining version histories for detecting code smells. IEEE Trans Softw Eng. 2015;41:462–89. https://doi.org/10.1109/TSE.2014.2372760.
Sahin D, Kessentini M, Bechikh S, Deb K. Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol. 2014. https://doi.org/10.1145/2675067.
Ouni A, Kula RG, Kessentini M, Inoue K. Web service antipatterns detection using genetic programming. GECCO 2015—Proc. 2015 Genet. Evol. Comput. Conf. 2015; pp. 1351–8. https://doi.org/10.1145/2739480.2754724.
Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A. A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng. 2014;40:841–61. https://doi.org/10.1109/TSE.2014.2331057.
Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H. BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw. 2011;84:559–72. https://doi.org/10.1016/j.jss.2010.11.921.
Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng. 2016;21:1143–91. https://doi.org/10.1007/s10664-015-9378-4.
Arcelli Fontana F, Zanoni M. Code smell severity classification using machine learning techniques. Knowl Based Syst. 2017;128:43–58. https://doi.org/10.1016/j.knosys.2017.04.014.
Maneerat N, Muenchaisri P. Bad-smell prediction from software design model using machine learning techniques. Proc 2011 8th Int Jt Conf Comput Sci Softw Eng JCSSE 2011. 2011;331–6. https://doi.org/10.1109/JCSSE.2011.5930143.
Wang X, Dang Y, Zhang L, Zhang D, Lan E, Mei H. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012. Proc 27th IEEE/ACM Int Conf Autom Softw Eng ASE 2012. 2012;170–9.
Montréal U De, Maiga A, Ali N, Aïmeur E. Support Vector Machines for Anti-pattern Detection Categories and Subject Descriptors. Proc 27th IEEE/ACM Int Conf Autom Softw Eng. 2012;278–81.
Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S. Classification model for code clones based on machine learning. Empir Softw Eng. 2015;20:1095–125. https://doi.org/10.1007/s10664-014-9316-x.
Hassaine S, Khomh F, Guéhéneucy YG, Hamel S. IDS: An immune-inspired approach for the detection of software design smells. Proc—7th Int Conf Qual Inf commun technol QUATIC 2010. 2010;343–8. https://doi.org/10.1109/QUATIC.2010.61.
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E. SMURF: A SVM-based incremental anti-pattern detection approach. Proc—Work Conf Reverse Eng WCRE. 2012;466–75. https://doi.org/10.1109/WCRE.2012.56.
Khomh F, Vaucher S, Guéehéeneuc YG, Sahraoui H. A bayesian approach for the detection of code and design smells. Proc—Int Conf Qual Softw. 2009;305–14. https://doi.org/10.1109/QSIC.2009.47.
Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. 2015 IEEE 26th Int Symp Softw Reliab Eng ISSRE 2015. 2016;261–9. https://doi.org/10.1109/ISSRE.2015.7381819.
Fontana FA, Zanoni M, Marino A, Mäntylä MV. Code smell detection: Towards a machine learning-based approach. IEEE Int Conf Softw Maintenance, ICSM. 2013;396–9. https://doi.org/10.1109/ICSM.2013.56.
Kreimer J. Adaptive detection of design flaws. Electron Notes Theor Comput Sci. 2005;141:117–36. https://doi.org/10.1016/j.entcs.2005.02.059.
White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. ASE 2016—Proc 31st IEEE/ACM Int Conf Autom Softw Eng. 2016;87–98. https://doi.org/10.1145/2970276.2970326.
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. Proc IEEE. 2021;109:43–76. https://doi.org/10.1109/JPROC.2020.3004555.
Zuech R, Khoshgoftaar TM, Wald R. Intrusion detection and big heterogeneous data: a survey. J Big Data. 2015;2:3. https://doi.org/10.1186/s40537-015-0013-4.
Lodha R, Jain H, Kurup L. Big data challenges : data analysis perspective. Int J Curr Eng Technol. 2014;4:3286–9.
Arcelli Fontana Co-relatore F, Zanoni M. JCodeOdor: A software quality advisor through design flaws detection. 2012.
PMD. n.d. https://pmd.github.io/. Accessed 12 June 2023.
Fadul FM. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media. 2006. https://doi.org/10.1007/3-540-39538-5.
Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering. 2016. pp 1–12. https://doi.org/10.1145/2915970.2915984.
Kaur A, Dhiman G. A review on search-based tools and techniques to identify bad code smells in object-oriented systems. In: Harmony search and nature-inspired algorithms for engineering optimization theory application. ICHSA 2018, vol. 741. Springer Singapore; 2019. pp. 909–21. https://doi.org/10.1007/978-981-13-0761-4.
Olivas ES, Guerrero JDM, Martinez Sober M, Magdalena Benedito JR, Serrano López AJ. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. 2009. IGI global. https://doi.org/10.4018/978-1-60566-766-9.
Ramos M, Mello R De. On transfer learning in code smells detection. EasyChair; 2022 (Preprint).
De Stefano M, Pecorelli F, Palomba F, De Lucia A. Comparing within-and cross-project machine learning algorithms for code smell detection. MaLTESQuE 2021—Proc 5th Int Work Mach Learn Tech Softw Qual Evol Co-Located with ESEC/FSE 2021. 2021;1–6. https://doi.org/10.1145/3472674.3473978.
Krishna R, Menzies T. Bellwethers: a baseline method for transfer learning. IEEE Trans Softw Eng. 2019;45:1081–105. https://doi.org/10.1109/TSE.2018.2821670.
Ardimento P, Aversano L, Bernardi ML, Cimitile M, Iammarino M. Transfer learning for just-in-time design smells prediction using temporal convolutional networks. Proc 16th Int Conf Softw Technol ICSOFT 2021. 2021;310–7. https://doi.org/10.5220/0010602203100317
Sharma T. On the feasibility of transfer-learning code smells using deep learning. ACM Trans Softw Eng Methodol. 2019;1:1281–4. https://doi.org/10.1145/nnnnnnn.nnnnnnn.
Sharma T, Efstathiou V, Louridas P, Spinellis D. Code smell detection by deep direct-learning and transfer-learning. J Syst Softw. 2021;176:110936. https://doi.org/10.1016/j.jss.2021.110936.
Long M, Wang J, Sun J, Yu PS. Domain invariant transfer kernel learning. IEEE Trans Knowl Data Eng. 2015;27:1519–32. https://doi.org/10.1109/TKDE.2014.2373376.
AbuHassan A, Alshayeb M, Ghouti L. Software smell detection techniques: a systematic literature review. J Softw Evol Process. 2020. https://doi.org/10.1002/smr.2320.
Caram FL, Rodrigues BRDO, Campanelli AS, Parreiras FS. Machine learning techniques for code smells detection: a systematic mapping study. Int J Softw Eng Knowl Eng. 2019;29:285–316. https://doi.org/10.1142/S021819401950013X.
Azeem MI, Palomba F, Shi L, Wang Q. Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol. 2019;108:115–38. https://doi.org/10.1016/j.infsof.2018.12.009.
Gupta R, Kumar SS. A novel metric based detection of temporary field code smell and its empirical analysis. J King Saud Univ Comput Inf Sci. 2022. https://doi.org/10.1016/j.jksuci.2021.11.005.
Sharma T. DesigniteJava—Designite. n.d. https://www.designite-tools.com/designitejava/. Accessed 11 Feb 2021.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22:1345–59. https://doi.org/10.1051/matecconf/201816401047.
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big Data. 2016. https://doi.org/10.1186/s40537-016-0043-6.
Nam J, Pan SJ, Kim S. Transfer defect learning. Proc—Int. Conf. Softw. Eng. 2013; p 382–91. https://doi.org/10.1109/ICSE.2013.6606584
Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54:248–56. https://doi.org/10.1016/j.infsof.2011.09.007.
Day O, Khoshgoftaar TM. A survey on heterogeneous transfer learning. J Big Data. 2017. https://doi.org/10.1186/s40537-017-0089-0.
featurewiz PyPI. n.d. https://pypi.org/project/featurewiz/. Accessed 2 June 2023.
Al-Shaaby A, Aljamaan H, Alshayeb M. Bad smell detection using machine learning techniques: a systematic literature review. Arab J Sci Eng. 2020. https://doi.org/10.1007/s13369-019-04311-w.
Alkharabsheh K, Crespo Y, Manso E, Taboada JA. Software design smell detection: a systematic mapping study. Softw Qual J. 2018. https://doi.org/10.1007/s11219-018-9424-8.
Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, et al. The Qualitas Corpus: a curated collection of Java code for empirical studies. Proc—Asia-Pacific Softw. Eng. Conf. APSEC. 2010; p 336–45. https://doi.org/10.1109/APSEC.2010.46.
Rasool G, Arshad Z. A lightweight approach for detection of code smells. Arab J Sci Eng. 2017;42:483–506. https://doi.org/10.1007/s13369-016-2238-8.
Munro MJ. Product metrics for automatic identification of “bad smell” design problems in Java source-code. Proc Int Softw Metrics Symp. 2005;2005:125–33. https://doi.org/10.1109/METRICS.2005.38.
Gupta R, Singh SK. TFfinder : A software tool to discover temporary field code smell. 2nd IEEE Int. Conf. Adv. Comput. Commun. Control Netw 2021.
Sriperumbudur BK, Fukumizu K, Lanckriet GRG. On the relation between universality, characteristic kernels and RKHS embedding of measures. J Mach Learn Res. 2010;9:773–80.
Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans Neural Networks. 2011;22:199–210. https://doi.org/10.1109/TNN.2010.2091281.
Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ. Correcting sample selection bias by unlabeled data. Adv Neural Inf Process Syst. 2006;601–8.
Python Software Foundation. Welcome to Python.org. 2016. https://www.python.org/. Accessed 4 Aug 2021.
Scikit Learn. sklearn.model_selection.GridSearchCV—scikit-learn 0.24.2 documentation. 2020. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html. Accessed 5 Aug 2021.
Field A. Discovering statistics using IBM SPSS statistics, vol. 58. Thousand Oaks: Sage; 2013.
Halimu C, Kasem A, Newaz SHS. Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. PervasiveHealth Pervasive Comput Technol Healthc. 2019. https://doi.org/10.1145/3310986.3311023.
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21:1–13. https://doi.org/10.1186/s12864-019-6413-7.
Weiss KR, Khoshgoftaar TM. Analysis of transfer learning performance measures. Proc—2017 IEEE Int. Conf. Inf. Reuse Integr. IRI 2017, vol. 2017-Janua. 2017; pp. 338–45. https://doi.org/10.1109/IRI.2017.43.
Runeson P, Höst M, Rainer A, Regnell B. Case study research in software. Engineering. 2012. https://doi.org/10.1002/9781118181034.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical Approval
The experimental studies in the paper use datasets that do not contain any sensitive or private information because the main aim of this research is conceptual and methodological advancements.
Informed Consent
Not relevant. This research does not include any human subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advanced Computing and Data Sciences” guest edited by Mayank Singh, Vipin Tyagi and P.K. Gupta.
Appendices
Appendix A
List of Abbreaviations
SI no. | Symbol | Description of the symbol |
---|---|---|
1 | Ds | Source domain |
2 | DT | Target domain |
3 | Xʹ | Target feature space |
4 | X | Source feature space |
5 | P(Xʹ) | Marginal probability distribution of target feature space |
6 | P(X) | Marginal probability distribution of source feature space |
7 | TS | Source task |
8 | TT | Target task |
9 | Y | Source label set |
10 | Y’ | Target label set |
11 | P(Y|X) | Conditional probability distribution of source |
12 | P(Yʹ|Xʹ) | Conditional probability distribution of target |
13 | xi | ith instance of X |
14 | f (x) | Predictive function |
15 | x | x is an instance of X |
16 | y | y is a label of Y |
17 | DITKL | Domain Invariant Transfer Kernel Learning |
18 | MDITKL | Modified Domain Invariant Transfer Kernel Learning |
19 | SVM | Support vector machine |
20 | k | Kernel |
21 | ζ | Eigen-damping factor |
22 | AUC | Area under an ROC curve |
Appendix B
Hyperparameters for traditional machine learning models
SI no. | Name of algorithm | Used modules and their parameters of python sklearn |
---|---|---|
1 | Random forest | param_grid1 = [ {'n_estimators': [1,10,100,100]}] GridSearchCV(estimator = RandomForestClassifier(n_estimators = 100), param_grid = param_grid1, cv = 3, verbose = True, n_jobs = -1) RandomForestClassifier(best_c) |
2 | Decision tree (C4.5) | DecisionTreeClassifier() |
3 | Decision Tree (ID3) | Id3Estimator (gain_ratio = True, prune = True) |
4 | Naïve Bayes | GaussianNB() |
5 | Multilayer perceptron | param_grid1 = [ {'solver': ['lbfgs', 'sgd', 'adam']}] GridSearchCV(estimator = MLPClassifier(solver = 'lbfgs', alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1) MLPClassifier(solver = best_c, alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1) |
6 | KNeighborsClassifier | param_grid1 = [{'n_neighbors': [1,2,3,4,5,6]}] GridSearchCV(estimator = KNeighborsClassifier(), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1) |
Appendix C
Snapshot of dataset 3
Snapshot of dataset 5
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, R., Singh, S.K. A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility Study. SN COMPUT. SCI. 4, 749 (2023). https://doi.org/10.1007/s42979-023-02157-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02157-6