[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility Study

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Code smell detection has been primarily focused on homogeneous data. However, due to diverse sources of data, in a real-life scenario, the unseen target data on which code smell needs to be predicted may be heterogeneous in feature space representation from source data for which code smells are known. Also, the capability of a state-of-the-art technique of machine learning called “transfer learning” has not been well explored to transfer the knowledge of already known code smells from source data to predict code smells on unseen heterogeneous target data. This paper has examined the feasibility of transfer learning to predict code smells on unseen heterogeneous target data. The paper has proposed a novel method for detecting code smell on heterogeneous data using modified domain invariant transfer kernel learning (DITKL), one of the transfer learning techniques. The experiments were conducted using modified DITKL on six traditional machine learning models on long method and temporary field code smells. Results showed that modified DITKL on Naïve Bayes and the ID3 decision tree outperformed others for long method code smell, and modified DITKL on Multilayer Perceptron and Naïve Bayes performed well for temporary field code smell. The proposed method can be quite useful to detect code smells in unseen heterogeneous data when a tool or expert knowledge cannot be applied to detect a code smell due to characteristics of the unseen data. It can also help in establishing benchmark data for code smells.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Martin Fowler by, Beck K, Brant J, Opdyke W, Roberts D. Refactoring: Improving the Design of Existing Code. 2002.

  2. Sharma T, Spinellis D. A survey on software smells. J Syst Softw. 2018;138:158–73. https://doi.org/10.1016/j.jss.2017.12.034.

    Article  Google Scholar 

  3. Carvalho SG, Aniche M, Veríssimo J, Durelli RS, Gerosa MA. An empirical catalog of code smells for the presentation layer of android apps. Empir Softw Eng. 2019;24:3546–86. https://doi.org/10.1007/s10664-019-09768-9.

    Article  Google Scholar 

  4. Sharma T, Fragkoulis M, Rizou S, Bruntink M, Spinellis D. Smelly relations: Measuring and understanding database schema quality. Proc. - Int. Conf. Softw. Eng. IEEE Computer Society; 2018; p. 55–64. https://doi.org/10.1145/3183519.3183529.

  5. Sharma T, Singh P, Spinellis D. An empirical investigation on the relationship between design and architecture smells. Empir Softw Eng. 2020;25:4020–68. https://doi.org/10.1007/s10664-020-09847-2.

    Article  Google Scholar 

  6. Marinescu R. Measurement and quality in object-oriented design. IEEE Int Conf Softw Maint ICSM. 2005;2005:701–4. https://doi.org/10.1109/ICSM.2005.63.

    Article  Google Scholar 

  7. Salehie M, Li S, Tahvildari L. A metric-based heuristic framework to detect object-oriented design flaws. IEEE Int Conf Progr Compr. 2006;2006:159–68. https://doi.org/10.1109/ICPC.2006.6.

    Article  Google Scholar 

  8. Vidal SA, Marcos C, Díaz-Pace JA. An approach to prioritize code smells for refactoring. Autom Softw Eng. 2016;23:501–32. https://doi.org/10.1007/s10515-014-0175-x.

    Article  Google Scholar 

  9. Moha N, Guéhéneuc YG, Duchien L, Le Meur AF. DECOR: A method for the specification and detection of code and design smells. IEEE Trans Softw Eng. 2010;36:20–36. https://doi.org/10.1109/TSE.2009.50.

    Article  MATH  Google Scholar 

  10. Arnaoudova V, Di Penta M, Antoniol G, Guéhéneuc YG. A new family of software anti-patterns: linguistic anti-patterns. Proc Eur Conf Softw Maint Reeng CSMR. 2013. https://doi.org/10.1109/CSMR.2013.28.

    Article  Google Scholar 

  11. Tsantalis N, Chatzigeorgiou A. Identification of extract method refactoring opportunities for the decomposition of methods. J Syst Softw. 2011;84:1757–82. https://doi.org/10.1016/j.jss.2011.05.016.

    Article  Google Scholar 

  12. Sharma T, Mishra P, Tiwari R Designite—A software design quality assessment tool. Proc. - 1st Int. Work. Bringing Archit. Des. Think. Into Dev. Dly. Act. Bridg. 2016. Association for Computing Machinery, Inc; 2016; pp. 1–4. https://doi.org/10.1145/2896935.2896938.

  13. Fu S, Shen B. Code bad smell detection through evolutionary data mining. Int Symp Empir Softw Eng Meas. 2015;2015:41–9. https://doi.org/10.1109/ESEM.2015.7321194.

    Article  Google Scholar 

  14. Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A. Mining version histories for detecting code smells. IEEE Trans Softw Eng. 2015;41:462–89. https://doi.org/10.1109/TSE.2014.2372760.

    Article  Google Scholar 

  15. Sahin D, Kessentini M, Bechikh S, Deb K. Code-smell detection as a bilevel problem. ACM Trans Softw Eng Methodol. 2014. https://doi.org/10.1145/2675067.

    Article  Google Scholar 

  16. Ouni A, Kula RG, Kessentini M, Inoue K. Web service antipatterns detection using genetic programming. GECCO 2015—Proc. 2015 Genet. Evol. Comput. Conf. 2015; pp. 1351–8. https://doi.org/10.1145/2739480.2754724.

  17. Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A. A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng. 2014;40:841–61. https://doi.org/10.1109/TSE.2014.2331057.

    Article  Google Scholar 

  18. Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H. BDTEX: a GQM-based Bayesian approach for the detection of antipatterns. J Syst Softw. 2011;84:559–72. https://doi.org/10.1016/j.jss.2010.11.921.

    Article  Google Scholar 

  19. Arcelli Fontana F, Mäntylä MV, Zanoni M, Marino A. Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng. 2016;21:1143–91. https://doi.org/10.1007/s10664-015-9378-4.

    Article  Google Scholar 

  20. Arcelli Fontana F, Zanoni M. Code smell severity classification using machine learning techniques. Knowl Based Syst. 2017;128:43–58. https://doi.org/10.1016/j.knosys.2017.04.014.

    Article  Google Scholar 

  21. Maneerat N, Muenchaisri P. Bad-smell prediction from software design model using machine learning techniques. Proc 2011 8th Int Jt Conf Comput Sci Softw Eng JCSSE 2011. 2011;331–6. https://doi.org/10.1109/JCSSE.2011.5930143.

  22. Wang X, Dang Y, Zhang L, Zhang D, Lan E, Mei H. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012. Proc 27th IEEE/ACM Int Conf Autom Softw Eng ASE 2012. 2012;170–9.

  23. Montréal U De, Maiga A, Ali N, Aïmeur E. Support Vector Machines for Anti-pattern Detection Categories and Subject Descriptors. Proc 27th IEEE/ACM Int Conf Autom Softw Eng. 2012;278–81.

  24. Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S. Classification model for code clones based on machine learning. Empir Softw Eng. 2015;20:1095–125. https://doi.org/10.1007/s10664-014-9316-x.

    Article  Google Scholar 

  25. Hassaine S, Khomh F, Guéhéneucy YG, Hamel S. IDS: An immune-inspired approach for the detection of software design smells. Proc—7th Int Conf Qual Inf commun technol QUATIC 2010. 2010;343–8. https://doi.org/10.1109/QUATIC.2010.61.

  26. Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc YG, Aimeur E. SMURF: A SVM-based incremental anti-pattern detection approach. Proc—Work Conf Reverse Eng WCRE. 2012;466–75. https://doi.org/10.1109/WCRE.2012.56.

  27. Khomh F, Vaucher S, Guéehéeneuc YG, Sahraoui H. A bayesian approach for the detection of code and design smells. Proc—Int Conf Qual Softw. 2009;305–14. https://doi.org/10.1109/QSIC.2009.47.

  28. Amorim L, Costa E, Antunes N, Fonseca B, Ribeiro M. Experience report: Evaluating the effectiveness of decision trees for detecting code smells. 2015 IEEE 26th Int Symp Softw Reliab Eng ISSRE 2015. 2016;261–9. https://doi.org/10.1109/ISSRE.2015.7381819.

  29. Fontana FA, Zanoni M, Marino A, Mäntylä MV. Code smell detection: Towards a machine learning-based approach. IEEE Int Conf Softw Maintenance, ICSM. 2013;396–9. https://doi.org/10.1109/ICSM.2013.56.

  30. Kreimer J. Adaptive detection of design flaws. Electron Notes Theor Comput Sci. 2005;141:117–36. https://doi.org/10.1016/j.entcs.2005.02.059.

    Article  Google Scholar 

  31. White M, Tufano M, Vendome C, Poshyvanyk D. Deep learning code fragments for code clone detection. ASE 2016—Proc 31st IEEE/ACM Int Conf Autom Softw Eng. 2016;87–98. https://doi.org/10.1145/2970276.2970326.

  32. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. Proc IEEE. 2021;109:43–76. https://doi.org/10.1109/JPROC.2020.3004555.

    Article  Google Scholar 

  33. Zuech R, Khoshgoftaar TM, Wald R. Intrusion detection and big heterogeneous data: a survey. J Big Data. 2015;2:3. https://doi.org/10.1186/s40537-015-0013-4.

    Article  Google Scholar 

  34. Lodha R, Jain H, Kurup L. Big data challenges : data analysis perspective. Int J Curr Eng Technol. 2014;4:3286–9.

    Google Scholar 

  35. Arcelli Fontana Co-relatore F, Zanoni M. JCodeOdor: A software quality advisor through design flaws detection. 2012.

  36. PMD. n.d. https://pmd.github.io/. Accessed 12 June 2023.

  37. Fadul FM. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media. 2006. https://doi.org/10.1007/3-540-39538-5.

  38. Fernandes E, Oliveira J, Vale G, Paiva T, Figueiredo E. A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering. 2016. pp 1–12. https://doi.org/10.1145/2915970.2915984.

  39. Kaur A, Dhiman G. A review on search-based tools and techniques to identify bad code smells in object-oriented systems. In: Harmony search and nature-inspired algorithms for engineering optimization theory application. ICHSA 2018, vol. 741. Springer Singapore; 2019. pp. 909–21. https://doi.org/10.1007/978-981-13-0761-4.

  40. Olivas ES, Guerrero JDM, Martinez Sober M, Magdalena Benedito JR, Serrano López AJ. Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. 2009. IGI global. https://doi.org/10.4018/978-1-60566-766-9.

  41. Ramos M, Mello R De. On transfer learning in code smells detection. EasyChair; 2022 (Preprint).

  42. De Stefano M, Pecorelli F, Palomba F, De Lucia A. Comparing within-and cross-project machine learning algorithms for code smell detection. MaLTESQuE 2021—Proc 5th Int Work Mach Learn Tech Softw Qual Evol Co-Located with ESEC/FSE 2021. 2021;1–6. https://doi.org/10.1145/3472674.3473978.

  43. Krishna R, Menzies T. Bellwethers: a baseline method for transfer learning. IEEE Trans Softw Eng. 2019;45:1081–105. https://doi.org/10.1109/TSE.2018.2821670.

    Article  Google Scholar 

  44. Ardimento P, Aversano L, Bernardi ML, Cimitile M, Iammarino M. Transfer learning for just-in-time design smells prediction using temporal convolutional networks. Proc 16th Int Conf Softw Technol ICSOFT 2021. 2021;310–7. https://doi.org/10.5220/0010602203100317

  45. Sharma T. On the feasibility of transfer-learning code smells using deep learning. ACM Trans Softw Eng Methodol. 2019;1:1281–4. https://doi.org/10.1145/nnnnnnn.nnnnnnn.

    Article  Google Scholar 

  46. Sharma T, Efstathiou V, Louridas P, Spinellis D. Code smell detection by deep direct-learning and transfer-learning. J Syst Softw. 2021;176:110936. https://doi.org/10.1016/j.jss.2021.110936.

    Article  Google Scholar 

  47. Long M, Wang J, Sun J, Yu PS. Domain invariant transfer kernel learning. IEEE Trans Knowl Data Eng. 2015;27:1519–32. https://doi.org/10.1109/TKDE.2014.2373376.

    Article  Google Scholar 

  48. AbuHassan A, Alshayeb M, Ghouti L. Software smell detection techniques: a systematic literature review. J Softw Evol Process. 2020. https://doi.org/10.1002/smr.2320.

    Article  Google Scholar 

  49. Caram FL, Rodrigues BRDO, Campanelli AS, Parreiras FS. Machine learning techniques for code smells detection: a systematic mapping study. Int J Softw Eng Knowl Eng. 2019;29:285–316. https://doi.org/10.1142/S021819401950013X.

    Article  Google Scholar 

  50. Azeem MI, Palomba F, Shi L, Wang Q. Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf Softw Technol. 2019;108:115–38. https://doi.org/10.1016/j.infsof.2018.12.009.

    Article  Google Scholar 

  51. Gupta R, Kumar SS. A novel metric based detection of temporary field code smell and its empirical analysis. J King Saud Univ Comput Inf Sci. 2022. https://doi.org/10.1016/j.jksuci.2021.11.005.

    Article  Google Scholar 

  52. Sharma T. DesigniteJava—Designite. n.d. https://www.designite-tools.com/designitejava/. Accessed 11 Feb 2021.

  53. https://stackoverflow.com/. n.d.

  54. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22:1345–59. https://doi.org/10.1051/matecconf/201816401047.

    Article  Google Scholar 

  55. Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big Data. 2016. https://doi.org/10.1186/s40537-016-0043-6.

    Article  Google Scholar 

  56. Nam J, Pan SJ, Kim S. Transfer defect learning. Proc—Int. Conf. Softw. Eng. 2013; p 382–91. https://doi.org/10.1109/ICSE.2013.6606584

  57. Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross-company software defect prediction. Inf Softw Technol. 2012;54:248–56. https://doi.org/10.1016/j.infsof.2011.09.007.

    Article  Google Scholar 

  58. Day O, Khoshgoftaar TM. A survey on heterogeneous transfer learning. J Big Data. 2017. https://doi.org/10.1186/s40537-017-0089-0.

    Article  Google Scholar 

  59. featurewiz PyPI. n.d. https://pypi.org/project/featurewiz/. Accessed 2 June 2023.

  60. Al-Shaaby A, Aljamaan H, Alshayeb M. Bad smell detection using machine learning techniques: a systematic literature review. Arab J Sci Eng. 2020. https://doi.org/10.1007/s13369-019-04311-w.

    Article  Google Scholar 

  61. Alkharabsheh K, Crespo Y, Manso E, Taboada JA. Software design smell detection: a systematic mapping study. Softw Qual J. 2018. https://doi.org/10.1007/s11219-018-9424-8.

    Article  Google Scholar 

  62. Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, et al. The Qualitas Corpus: a curated collection of Java code for empirical studies. Proc—Asia-Pacific Softw. Eng. Conf. APSEC. 2010; p 336–45. https://doi.org/10.1109/APSEC.2010.46.

  63. Rasool G, Arshad Z. A lightweight approach for detection of code smells. Arab J Sci Eng. 2017;42:483–506. https://doi.org/10.1007/s13369-016-2238-8.

    Article  Google Scholar 

  64. Munro MJ. Product metrics for automatic identification of “bad smell” design problems in Java source-code. Proc Int Softw Metrics Symp. 2005;2005:125–33. https://doi.org/10.1109/METRICS.2005.38.

    Article  Google Scholar 

  65. Gupta R, Singh SK. TFfinder : A software tool to discover temporary field code smell. 2nd IEEE Int. Conf. Adv. Comput. Commun. Control Netw 2021.

  66. Sriperumbudur BK, Fukumizu K, Lanckriet GRG. On the relation between universality, characteristic kernels and RKHS embedding of measures. J Mach Learn Res. 2010;9:773–80.

    MATH  Google Scholar 

  67. Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans Neural Networks. 2011;22:199–210. https://doi.org/10.1109/TNN.2010.2091281.

    Article  Google Scholar 

  68. Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ. Correcting sample selection bias by unlabeled data. Adv Neural Inf Process Syst. 2006;601–8.

  69. Python Software Foundation. Welcome to Python.org. 2016. https://www.python.org/. Accessed 4 Aug 2021.

  70. keras—Google Search. n.d. https://www.google.com/search?q=keras&ei=WFsKYdKILNDt9QOjgo3YDQ&oq=keras&gs_lcp=Cgdnd3Mtd2l6EAMyBwgAELEDEEMyBAgAEEMyBAgAEEMyBAgAEEMyBAgAEEMyBAgAEEMyBwgAELEDEEMyBAgAEEMyCAgAEIAEELEDMgQIABBDOgUIABCRAjoLCAAQgAQQsQMQgwE6DgguELEDEIMBEMcBENEDOggILhCxAxCDAToFCAAQgAQ6CAgAELEDEIMBOgUILhCABDoECC4QQzoRCC4QgAQQsQMQgwEQxwEQowI6BwguELEDEENKBAhBGABQ9_cvWKyBMGCshTBoAHACeACAAZwFiAGmFpIBCTItMS4wLjMuMpgBAKABAcABAQ&sclient=gws-wiz&ved=0ahUKEwjSlNrChJfyAhXQdn0KHSNBA9sQ4dUDCA4&uact=5. Accessed 4 Aug 2021.

  71. Scikit Learn. sklearn.model_selection.GridSearchCV—scikit-learn 0.24.2 documentation. 2020. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html. Accessed 5 Aug 2021.

  72. Field A. Discovering statistics using IBM SPSS statistics, vol. 58. Thousand Oaks: Sage; 2013.

    Google Scholar 

  73. Halimu C, Kasem A, Newaz SHS. Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. PervasiveHealth Pervasive Comput Technol Healthc. 2019. https://doi.org/10.1145/3310986.3311023.

    Article  Google Scholar 

  74. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21:1–13. https://doi.org/10.1186/s12864-019-6413-7.

    Article  Google Scholar 

  75. Weiss KR, Khoshgoftaar TM. Analysis of transfer learning performance measures. Proc—2017 IEEE Int. Conf. Inf. Reuse Integr. IRI 2017, vol. 2017-Janua. 2017; pp. 338–45. https://doi.org/10.1109/IRI.2017.43.

  76. Runeson P, Höst M, Rainer A, Regnell B. Case study research in software. Engineering. 2012. https://doi.org/10.1002/9781118181034.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruchin Gupta.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical Approval

The experimental studies in the paper use datasets that do not contain any sensitive or private information because the main aim of this research is conceptual and methodological advancements.

Informed Consent

Not relevant. This research does not include any human subjects.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advanced Computing and Data Sciences” guest edited by Mayank Singh, Vipin Tyagi and P.K. Gupta.

Appendices

Appendix A

List of Abbreaviations

SI no.

Symbol

Description of the symbol

1

Ds

Source domain

2

DT

Target domain

3

Target feature space

4

X

Source feature space

5

P(Xʹ)

Marginal probability distribution of target feature space

6

P(X)

Marginal probability distribution of source feature space

7

TS

Source task

8

TT

Target task

9

Y

Source label set

10

Y’

Target label set

11

P(Y|X)

Conditional probability distribution of source

12

P(Yʹ|Xʹ)

Conditional probability distribution of target

13

xi

ith instance of X

14

f (x)

Predictive function

15

x

x is an instance of X

16

y

y is a label of Y

17

DITKL

Domain Invariant Transfer Kernel Learning

18

MDITKL

Modified Domain Invariant Transfer Kernel Learning

19

SVM

Support vector machine

20

k

Kernel

21

ζ

Eigen-damping factor

22

AUC

Area under an ROC curve

Appendix B

Hyperparameters for traditional machine learning models

SI no.

Name of algorithm

Used modules and their parameters of python sklearn

1

Random forest

param_grid1 = [ {'n_estimators': [1,10,100,100]}] GridSearchCV(estimator = RandomForestClassifier(n_estimators = 100), param_grid = param_grid1, cv = 3, verbose = True, n_jobs = -1)

RandomForestClassifier(best_c)

2

Decision tree (C4.5)

DecisionTreeClassifier()

3

Decision Tree (ID3)

Id3Estimator (gain_ratio = True, prune = True)

4

Naïve Bayes

GaussianNB()

5

Multilayer perceptron

param_grid1 = [ {'solver': ['lbfgs', 'sgd', 'adam']}] GridSearchCV(estimator = MLPClassifier(solver = 'lbfgs', alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1)

MLPClassifier(solver = best_c, alpha = 1e-5,hidden_layer_sizes = (8, 8), random_state = 1)

6

KNeighborsClassifier

param_grid1 = [{'n_neighbors': [1,2,3,4,5,6]}] GridSearchCV(estimator = KNeighborsClassifier(), param_grid = param_grid1, cv = 5,verbose = True, n_jobs = -1)

Appendix C

Snapshot of dataset 3

figure b

Snapshot of dataset 5

figure c

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, R., Singh, S.K. A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility Study. SN COMPUT. SCI. 4, 749 (2023). https://doi.org/10.1007/s42979-023-02157-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02157-6

Keywords

Navigation