Abstract
Software fault prediction (SFP) is a quality assurance process that identifies if certain modules are fault-prone (FP) or not-fault-prone (NFP). Hence, it minimizes the testing efforts incurred in terms of cost and time. Supervised machine learning techniques have capacity to spot-out the FP modules. However, such techniques require fault information from previous versions of software product. Such information, accumulated over the life-cycle of software, may neither be readily available nor reliable. Currently, clustering with experts’ opinions is a prudent choice for labeling the modules without any fault information. However, the asserted technique may not fully comprehend important aspects such as selection of experts, conflict in expert opinions, catering the diverse expertise of domain experts etc. In this paper, we propose a comprehensive framework named EkmEx that extends the conventional fault prediction approaches while providing mathematical foundation through aspects not addressed so far. The EkmEx guides in selection of experts, furnishes an objective solution for resolve of verdict-conflicts and manages the problem of diversity in expertise of domain experts. We performed expert-assisted module labeling through EkmEx and conventional clustering on seven public datasets of NASA. The empirical outcomes of research exhibit significant potential of the proposed framework in identifying FP modules across all seven datasets.
Similar content being viewed by others
References
AbuHassan A, Alshayeb M, Ghouti L (2020) Software smell detection techniques: A systematic literature review. J Softw Evol Process :e2320
Alsghaier H, Akour M (2020) Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier. Softw Pract Exper 50(4):407–427. https://doi.org/10.1002/spe.2784
Al-Shaaby A, Aljamaan H, Alshayeb M (2020) Bad smell detection using machine learning techniques: A systematic literature review. Arab J Sci Eng :1–29
Amasaki S (2020) Cross-version defect prediction: use historical data, cross-project data, or both? Empir Softw Eng :1–23
Beecham S, Hall T, Bowes D, Gray D, Counsell S, Black S (2010) A systematic review of fault prediction approaches used in software engineering. The Irish Software Engineering Research Centre, Limerick, Ireland
Beecham S, Hall T, Bowes D, Gray D, Counsell S, Black S (2010) A systematic review of fault prediction approaches used in software engineering, Technical Report Lero-TR-2010-04, Lero, Tech Rep.
Bender R (1999) Quantitative risk assessment in epidemiological studies investigating threshold effects. Biometric J 41(3):305–319
Bird C, Bachmann A, Aune E, Duffy J, Bernstein (2009) Fair and balanced? bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ser. ESEC/FSE ’09. Association for Computing Machinery, New York, pp 121–130. https://doi.org/10.1145/1595696.1595716
Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based k-means clustering algorithm. IEEE Trans Knowl Data Eng 24 (6):1146–1150
Boetticher G, Menzies T, Ostrand T (2007) {PROMISE} repository of empirical software engineering data, ArXiv
Briand LC, Daly J, Porter V, Wust J (1998) A comprehensive empirical validation of design measures for object-oriented systems. In: Proceedings fifth international software metrics symposium, metrics (Cat. No.98TB100262), pp 246–257
Catal C (2011) Software fault prediction: A literature review and current trends. Expert Syst Appl 38(4):4626–4636
Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36(4):7346–7354
Catal C, Sevim U, Diri B (2009) Software fault prediction of unlabeled program modules. In: Proceedings of the world congress on engineering, vol 1, pp 1–3
Catal C, Sevim U, Diri B (2009) Clustering and metrics thresholds based software fault prediction of unlabeled program modules. In: 2009 Sixth international conference on information technology: new generations, pp 199–204
Chappelly T, Cifuentes C, Krishnan P, Gevay S (2017) Machine learning for finding bugs: An initial report. In: Machine learning techniques for software quality evaluation (MaLTeSQuE), IEEE Workshop on. IEEE, pp 21–26
El Emam K, Benlarbi S, Goel N, Rai S (1999) A validation of object-oriented metrics. National Research Council Canada Institute for Information Technology
El-Emam K, Melo W (2001) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56:02
Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach. CRC Press, Boca Raton
Ghani I (2014) Handbook of research on emerging advancements and technologies in software engineering. IGI Global
Gondra I (2008) Applying machine learning to software fault-proneness prediction. J Syst Softw 81(2):186–195
Gupta R, Singh SK (2020) Using software metrics to detect temporary field code smell. In: 2020 10th international conference on cloud computing, data science engineering (Confluence), pp 45–49
Hall T, Zhang M, Bowes D, Sun Y (2014) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol 23(4). https://doi.org/10.1145/2629648
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc., New York
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of the 9th international conference on predictive models in software engineering, ser. PROMISE ’13. Association for Computing Machinery, New York. https://doi.org/10.1145/2499393.2499395
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, ser. ICSE ’13. IEEE Press, pp 392–401
I. 9000:2015(en) (2015) Quality management systems — fundamentals and vocabulary, ISO
Kotková B., Hromada M (2020) Adverse event in a medical facility-blackout. Int J Power Syst 5
Li W, Shatnawi R (2007) An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J Syst Softw 80(7):1120–1128. https://doi.org/10.1016/j.jss.2006.10.018
Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. Inst Eng Technol Softw 12(3):161–175
Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimization on transfer learning for cpdp: An empirical study. arXiv:2002.03148
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27(C):504–518
Marinescu R (2004) Detection strategies: metrics-based rules for detecting design flaws. In: 20th IEEE international conference on software maintenance, 2004. Proceedings., pp 350–359
Martinetz TM, Berkovich SG, Schulten KJ (1993) ’neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans Neural Netw 4(4):558–569
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
McCabe TJ, Butler CW (1989) Design complexity measurement and testing. Commun ACM 32(12):1415–1425
Nam J, Kim S (2015) Clami: Defect prediction on unlabeled datasets (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 452–463
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous defect prediction. IEEE Trans Softw Eng
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems. sMIT Press, pp 849–856
Olbrich S, Cruzes DS, Basili V, Zazworka N (2009) The evolution and impact of code smells: A case study of two open source systems. In: 2009 3rd international symposium on empirical software engineering and measurement, pp 390–400
Olbrich SM, Cruzes DS, Sjøberg DIK (2010) Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems. In: 2010 IEEE international conference on software maintenance, pp 1–10
Radjenović D, Heričko M, Torkar R, živkovič A (2013) Software fault prediction metrics: A systematic literature review. Inf Softw Technol 55 (8):1397–1418
Rathore SS, Kumar S (2017) A decision tree logic based recommendation system to select software fault prediction techniques. Computing 99(3):255–285
Rodriguez D, Ruiz R, Riquelme JC, Harrison R (2013) A study of subgroup discovery approaches for defect prediction. Inf Softw Technol 55 (10):1810–1822. https://doi.org/10.1016/j.infsof.2013.05.002
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Seliya N, Khoshgoftaar TM (2007) Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans Syst Man Cybern A Syst Humans 37(2):201–211
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: Some comments on the nasa software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Sjoberg DIK, Yamashita A, Anda B, Mockus A, Dyba T (2013) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156. https://doi.org/10.1109/TSE.2012.89
Son L, Pritam N, Khari M, Kumar R, Phuong P, Pham T (2019) Empirical study of software defect prediction: A systematic mapping. Symmetry 11:212
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Wahono RS (2015) A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng 1(1):1–16
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of the 4th international workshop on predictor models in software engineering, ser. PROMISE ’08. ACM, New York, pp 19–24
Xu Z, Pang S, Zhang T, Luo X-P, Liu J, Tang Y-T, Yu X, Xue L (2019) Cross project defect prediction via balanced distribution adaptation based transfer learning. J Comput Sci Technol 34(5):1039–1062
Yan M, Fang Y, Lo D, Xia X, Zhang X (2017) File-level defect prediction: Unsupervised vs. supervised models. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). pp 344–353
Yang J, Qian H (2016) Defect prediction on unlabeled datasets by using unsupervised clustering. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on Smart City; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), pp 465–472
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168
Yang Y, Yang J, Qian H (2018) Defect prediction by using cluster ensembles. In: 2018 tenth international conference on advanced computational intelligence (ICACI), pp 631–636
Yuan X, Khoshgoftaar TM, Allen EB, Ganesan K (2000) An application of fuzzy clustering to software quality prediction. In: Proceedings 3rd IEEE symposium on application-specific systems and software engineering technology, pp 85–90
Zakari A, Lee SP (2019) Simultaneous isolation of software faults for effective fault localization. In: 2019 IEEE 15th international colloquium on signal processing & its applications (CSPA). IEEE, pp 16–20
Zhang J, Wu J, Chen C, Zheng Z, Lyu MR (2020) Cds: A cross–version software defect prediction model with data selection. IEEE Access 8:110059–110072
Zhong Shi, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Eighth IEEE international symposium on high assurance systems engineering, 2004. Proceedings., pp 149–155
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2019) A comprehensive survey on transfer learning. arXiv:1911.02685
Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: 2008 ACM/IEEE 30th international conference on software engineering, pp 531–540
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rizwan, M., Nadeem, A., Sarwar, S. et al. EkmEx - an extended framework for labeling an unlabeled fault dataset. Multimed Tools Appl 81, 12141–12156 (2022). https://doi.org/10.1007/s11042-021-11441-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11441-7