Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study
<p>Experimental framework.</p> "> Figure 2
<p>Box-plot representation of FS methods on NB and DT classifier models with respect to accuracy and AUC. (<b>a</b>) Accuracy values of NB (<b>b</b>) Accuracy values of DT (<b>c</b>) AUC values of NB (<b>d</b>) AUC values of DT.</p> "> Figure 3
<p>Scott–Knott rank test result of FS methods on NB and DT classifier models with respect to accuracy and AUC. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 4
<p>Scott–Knott rank test results of FS methods with NB and DT classifiers based on accuracy and AUC for AEEEM dataset. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 5
<p>Scott–Knott rank test results of FS methods with NB and DT classifier based on accuracy and AUC for the NASA dataset. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 6
<p>Scott–Knott rank test results of FS methods with NB and DT classifier based on accuracy and AUC for the ReLink dataset. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 7
<p>Scott–Knott rank test results of FS methods with NB and DT classifier based on accuracy and AUC for the PROMISE dataset. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 8
<p>Scott–Knott rank test results of FSS techniques with different search methods using NB and DT classifiers. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 9
<p>Scott–Knott rank test results of FFR techniques on NB and DT. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 10
<p>Double Scott–Knott rank test results of FFR methods on studied datasets. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 11
<p>Scott–Knott rank test results of FSS techniques on NB and DT. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 12
<p>Double Scott–Knott rank test results of FSS methods on studied datasets. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 13
<p>Scott–Knott rank test results of WFS techniques on NB and DT. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> "> Figure 14
<p>Double Scott–Knott rank test results of WFS techniques on NB and DT. (<b>a</b>) Average accuracy values of NB (<b>b</b>) Average accuracy values of DT (<b>c</b>) Average AUC values of NB (<b>d</b>) Average AUC values of DT.</p> ">
Abstract
:1. Introduction
- An extensive benchmark study on the impact of 46 FS methods on two classifiers over 25 datasets in SDP. An empirical study of this magnitude is the strength of this study.
- This study addresses the biases found in the existing SDP studies in terms of limited FS methods, small datasets, and unsuitable prediction models. To the best of our knowledge, this is the first study to address these biases (limited FS methods, small datasets, and unsuitable prediction models), and no study has addressed them collectively.
- This study establishes sets of good FS methods in lieu of a single method as potential and useful alternatives in real-life applications and other ML tasks.
2. Related Studies
3. Research Methods
3.1. Feature Selection Methods
3.1.1. Filter Feature Ranking (FFR) Methods
3.1.2. Filter-Feature Subset Selection (FSS) Methods
3.1.3. Wrapper-Based Feature Selection (WFS) Methods
3.2. Classification Algorithms
3.3. Software Defect Datasets
3.4. Evaluation Metrics
- Accuracy is the number or percentage of correctly classified instances to the total sum of instances.
- The area under curve (AUC) shows the trade-off between TP and FP. It provides an aggregate measure of performance across all possible classification thresholds.
4. Methodology
4.1. Experimental Framework
- Feature Selection Phase:Each of the FFR methods (See Table 2) is applied on the training dataset of each original software defect datasets in Table 6. Specifically, CS, CO, CV, IG, GR, SU, PS, RFA, WRF, ORA, SVM; (based on Ranker Search Method), SSF (based on kMediod Sampling Method), and TPP (based on targeted projection pursuit search method), respectively, were used to evaluate and rank features of each datasets with respect to each FFR method underlining computational characteristic. (where N is number of features in each dataset) was used to select top-ranked features from the produced rank lists. Our choice of is in accordance with existing empirical studies in SDP [25,27,28,30]. Consequently, software defects datasets with reduced features will be produced.Further, 26 FSS methods (two FSS techniques (CFS and CNS) with 13 search methods as presented in Table 3) are used on each of the original software defect datasets. Specifically, software features of each dataset were evaluated and ranked by these search methods. These search methods automatically select and generate a subset of important features by traversing the feature space of each dataset. Each WFS method was used with respective classifiers (See Table 5) with three different search methods (linear forward search (LFS), subset-size forward selection (SFS), and incremental wrapper subset selection (IWSS) (See Table 4)). Just as in FFR, software defect datasets with reduced features will be generated and passed into the prediction phase.The training dataset of each original software defect datasets was pre-processed and used in the feature selection phase. This is to avoid the latent error made in some existing studies by pre-processing the whole dataset instead of only the training dataset [25,28]. Ghotra, et al. [25] pointed out that incorrect application of FS methods is one possible reason for inconsistencies across prior studies.
- Model Construction and Evaluation PhaseIn this phase, software defect datasets with reduced features from the feature selection phase were trained with two classifiers (NB and DT). This is to show the adequacy and importance of reduced software metrics in SDP. As aforementioned, the 10-folds CV technique was used to develop each model. The essence of the 10-folds CV is to mitigate biases and overfitting of the ensuing prediction models. In addition, K-folds CV technique has been known to mitigate class imbalance problem which is a prevalent data quality problem in machine learning [22,28]. The predictive performances of the resulting SDP models were evaluated based on accuracy, f-measure, and AUC. In addition, due to the random nature of the search methods of the FSS methods, each experiment involving FSS methods was performed 10 times, and the average values are obtained.
4.2. Research Questions: Motivation and Methodological Approach
5. Experimental Results and Discussion
- FFR methods had a positive impact on the predictive performance of prediction models (NB and DT) regardless of the dataset repository. However, there is no one best FFR method as these top-performing FFR methods have different computational characteristics. We recommend the usage of statistical-based (CS and CO), probability-based (IG, GR, SU, and PS), and classifier-based (SVM and OR) FFR methods, respectively, in SDP.
- FSS methods also had a positive impact on the predictive performance of prediction models (NB and DT). CFS recorded superior performance to CNS regardless of the implemented FSS search methods and dataset repository. In addition, metaheuristics search methods had a superior effect on FSS technique than conventional BFS method. We, therefore, recommend the usage of CFS based on metaheuristic search methods (AS, BS, BAT, CS, ES, FS, FLS, NSGA-II, PSOS, and WS) in FSS methods for SDP.
- WFS methods had a positive impact on the predictive performance of prediction models. WFS methods were superior in performance to FSS and FFR, but there is no statistical difference in their respective performances. IWSS-based WFS methods rank superior to SFS and LHS-based WFS methods.
6. Threats to Validity
- External validity: External validity addresses the generalizability of the research experimental process. Since the quality of experimental results depends on the dataset used, 25 SDP datasets from four software repositories (AEEEM, NASA, ReLink, and PROMISE) were used in this benchmark study. These datasets were selected based on their nature and characteristics. Nonetheless, the empirical experimental process can be rerun on datasets with new features.
- Internal validity: Internal validity refers to the selection preference of classifiers and FS methods. According to Gao, et al. [9], the interval validity of SDP studies could be affected by the preference of classifiers and software tools. Two classifiers (NB and DT) and 46 FS methods with varying characteristics (search methods and selection techniques) were selected in this study. These classifiers have been widely used and have been reported to be effective in SDP [17,28,38,39]. However, more classifiers can be deployed based on the experimental process in future works.
- Construct validity: Construct validity stresses on the performance evaluation metrics used to evaluate SDP models. Accuracy and AUC values were used in this benchmark study for performance evaluation. Our preference for AUC and accuracy is based on their extensive usage in existing SDP studies [2,30,38,49]. However, other available evaluation metrics can be used in future works.
- Conclusion validity: Conclusion validity addresses the statistical conclusion of a study. Scott-KnottESD Rank test was used to statistically evaluate and validate the impact of FS methods in this study. The Scott–KnottESD rank test has been suggested and widely used in existing SDP studies [2,5,25,28,54].
7. Conclusions and Future Works
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Balogun, A.O.; Bajeh, A.O.; Orie, V.A.; Yusuf-Asaju, A.W. Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method. FUOYE J. Eng. Technol. 2018, 3, 50–55. [Google Scholar] [CrossRef]
- Kondo, M.; Bezemer, C.-P.; Kamei, Y.; Hassan, A.E.; Mizuno, O. The impact of feature reduction techniques on defect prediction models. Empir. Softw. Eng. 2019, 24, 1925–1963. [Google Scholar] [CrossRef]
- Akintola, A.G.; Balogun, A.; Lafenwa-Balogun, F.B.; Mojeed, H.A. Comparative Analysis of Selected Heterogeneous Classifiers for Software Defects Prediction Using Filter-Based Feature Selection Methods. FUOYE J. Eng. Technol. 2018, 3, 134–137. [Google Scholar] [CrossRef]
- Mabayoje, M.A.; Balogun, A.O.; Jibril, H.A.; Atoyebi, J.O.; Mojeed, H.A.; Adeyemo, V.E. Parameter tuning in KNN for software defect prediction: An empirical analysis. J. Teknol. dan Sist. Kompût. 2019, 7, 121–126. [Google Scholar] [CrossRef] [Green Version]
- Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. The Impact of Automated Parameter Optimization on Defect Prediction Models. IEEE Trans. Softw. Eng. 2019, 45, 683–711. [Google Scholar] [CrossRef] [Green Version]
- Wohlin, C.; Runeson, P.; Höst, M.; Ohlsson, M.C.; Regnell, B.; Wesslén, A. Experimentation in Software Engineering; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Mojeed, H.A.; Bajeh, A.O.; Balogun, A.O.; Adeleke, H.O. Memetic Approach for Multi-Objective Overtime Planning in Software Engineering projects. J. Eng. Sci. Technol. 2019, 14, 3213–3233. [Google Scholar]
- Balogun, A.O.; Shuib, B.; Abdulkadir, S.J.; Sobri, A. A Hybrid Multi-Filter Wrapper Feature Selection Method for Software Defect Predictors. Int. J. Supply Chain Manag. 2019, 8, 916. [Google Scholar]
- Gao, K.; Khoshgoftaar, T.M.; Seliya, N. Predicting high-risk program modules by selecting the right software measurements. Softw. Qual. J. 2011, 20, 3–42. [Google Scholar] [CrossRef]
- Gao, K.; Khoshgoftaar, T.M.; Wang, H.; Seliya, N. Choosing software metrics for defect prediction: An investigation on feature selection techniques. Softw. Pract. Exp. 2011, 41, 579–606. [Google Scholar] [CrossRef] [Green Version]
- Bajeh, A.O.; Oluwatosin, O.-J.; Basri, S.; Akintola, A.G.; Balogun, A.O. Object-Oriented Measures as Testability Indicators: An Empirical Study. J. Eng. Sci. Technol. 2020, 15, 1092–1108. [Google Scholar]
- Anbu, M.; Mala, G.S.A. Feature selection using firefly algorithm in software defect prediction. Clust. Comput. 2017, 22, 10925–10934. [Google Scholar] [CrossRef]
- Majd, A.; Vahidi-Asl, M.; Khalilian, A.; Poorsarvi-Tehrani, P.; Haghighi, H. SLDeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Syst. Appl. 2020, 147, 113156. [Google Scholar] [CrossRef]
- Catal, C.; Yildirim, S. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 2009, 179, 1040–1058. [Google Scholar] [CrossRef]
- Hall, T.; Beecham, S.; Bowes, D.; Gray, D.; Counsell, S. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. Softw. Eng. 2011, 38, 1276–1304. [Google Scholar] [CrossRef]
- He, P.; Li, B.; Liu, X.; Chen, J.; Ma, Y. An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. 2015, 59, 170–190. [Google Scholar] [CrossRef] [Green Version]
- Mabayoje, M.A.; Balogun, A.O.; Bajeh, A.O.; Musa, B.A. Software Defect Prediction: Effect of feature selection and ensemble methods. FUW Trends Sci. Technol. J. 2018, 3, 518–522. [Google Scholar]
- Al-Tashi, Q.; Kadir, S.J.A.; Rais, H.M.; Mirjalili, S.; Alhussian, H. Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection. IEEE Access. 2019, 7, 39496–39508. [Google Scholar] [CrossRef]
- Al-Tashi, Q.; Rais, H.; Jadid, S. Feature Selection Method Based on Grey Wolf Optimization for Coronary Artery Disease Classification. In Proceedings of the 3rd International Conference of Reliable Information and Communication Technology (IRICT); Springer: Kuala Lumpur, Malaysia, 2018; pp. 257–266. [Google Scholar]
- Afzal, W.; Torkar, R. Towards benchmarking feature subset selection methods for software fault prediction. In Computational Intelligence and Quantitative Software Engineering; Springer: Berlin/Heidelberg, Germany, 2016; pp. 33–58. [Google Scholar]
- Mabayoje, M.A.; Balogun, A.O.; Bello, S.M.; Atoyebi, J.O.; Mojeed, H.A.; Ekundayo, A.H. Wrapper Feature Selection based Heterogeneous Classifiers for Software Defect Prediction. Adeleke Univ. J. Eng. Technol. 2019, 2, 1–11. [Google Scholar]
- Ibrahim, D.R.; Ghnemat, R.; Hudaib, A. Software Defect Prediction using Feature Selection and Random Forest Algorithm. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS); Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2017; pp. 252–257. [Google Scholar]
- Li, L.; Leung, H.K.N. Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement; IEEE: Piscataway, NJ, USA, 2011; pp. 207–214. [Google Scholar] [CrossRef]
- Rodriguez, D.; Ruiz, R.; Cuadrado-Gallego, J.; Aguilar-Ruiz, J.S.; Garre, M. Attribute Selection in Software Engineering Datasets for Detecting Fault Modules. In Proceedings of the 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO 2007); Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2007; pp. 418–423. [Google Scholar]
- Ghotra, B.; McIntosh, S.; Hassan, A.E. A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models. In Proceedings of the 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR); Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2017; pp. 146–157. [Google Scholar]
- Muthukumaran, K.; Rallapalli, A.; Murthy, N.L.B. Impact of Feature Selection Techniques on Bug Prediction Models. In Proceedings of the 8th India Software Engineering Conference on XXX—ISEC ’15; Association for Computing Machinery (ACM): Bengaluru, India, 2015; pp. 120–129. [Google Scholar]
- Rathore, S.S.; Gupta, A. A comparative study of feature-ranking and feature-subset selection techniques for improved fault prediction. In Proceedings of the 7th India Software Engineering Conference; Association for Computing Machinery (ACM): Chennai, India, 2014; pp. 1–10. [Google Scholar]
- Xu, Z.; Liu, J.; Yang, Z.; An, G.; Jia, X. The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison. In Proceedings of the 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE); IEEE: Ottawa, ON, Canada, 2016; pp. 309–320. [Google Scholar] [CrossRef]
- Wang, H.; Khoshgoftaar, T.M.; Napolitano, A.; Napolitano, A. An Empirical Study on the Stability of Feature Selection for Imbalanced Software Engineering Data. In Proceedings of the 2012 11th International Conference on Machine Learning and Applications; Institute of Electrical and Electronics Engineers (IEEE): Boca Raton, FL, USA, 2012; Volume 1, pp. 317–323. [Google Scholar]
- Balogun, A.O.; Basri, S.; Abdulkadir, S.J.; Hashim, A.S. Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach. Appl. Sci. 2019, 9, 2764. [Google Scholar] [CrossRef] [Green Version]
- Khoshgoftaar, T.M.; Gao, K.; Napolitano, A. An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. 2012, 22, 161–183. [Google Scholar] [CrossRef]
- Menzies, T.; Greenwald, J.; Frank, A. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Trans. Softw. Eng. 2007, 33, 2–13. [Google Scholar] [CrossRef]
- Wang, H.; Khoshgoftaar, T.M.; van Hulse, J.; Gao, K. Metric selection for software defect prediction. Int. J. Softw. Eng. Knowl. Eng. 2011, 21, 237–257. [Google Scholar] [CrossRef]
- Shivaji, S.; Whitehead, E.J.; Akella, R.; Kim, S. Reducing Features to Improve Code Change-Based Bug Prediction. IEEE Trans. Softw. Eng. 2012, 39, 552–569. [Google Scholar] [CrossRef] [Green Version]
- Lee, S.-J.; Xu, Z.; Li, T.; Yang, Y. A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J. Biomed. Inform. 2018, 78, 144–155. [Google Scholar] [CrossRef]
- Zemmal, N.; Azizi, N.; Sellami, M.; Zenakhra, D.; Cheriguene, S.; Dey, N.; Ashour, A.S. Robust feature selection algorithm based on transductive SVM wrapper and genetic algorithm: Application on computer-aided glaucoma classification. Int. J. Intell. Sys. Technol. Appl. 2018, 17, 310–346. [Google Scholar] [CrossRef]
- Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
- Balogun, A.O.; Basri, S.; Abdulkadir, S.J.; Adeyemo, V.E.; Imam, A.A.; Bajeh, A.O. Software Defect Prediction: Analysis of Class Imbalance and Performance Stability. J. Eng. Sci. Technol. 2019, 14, 3294–3308. [Google Scholar]
- Yu, Q.; Jiang, S.; Zhang, Y. The Performance Stability of Defect Prediction Models with Class Imbalance: An Empirical Study. IEICE Trans. Inf. Syst. 2017, 100, 265–272. [Google Scholar] [CrossRef] [Green Version]
- Lessmann, S.; Baesens, B.; Mues, C.; Pietsch, S. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans. Softw. Eng. 2008, 34, 485–496. [Google Scholar] [CrossRef] [Green Version]
- Balogun, A.; Bajeh, A.; Mojeed, H.; Akintola, A. Software defect prediction: A multi-criteria decision-making approach. Niger. J. Technol. Res. 2020, 15, 35–42. [Google Scholar]
- Padhy, N.; Satapathy, S.; Singh, R. State of the Art Object-Oriented Metrics, and its Reusability: A Decade Review in Smart Computing and Informatics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 431–441. [Google Scholar]
- Fenton, N.; Bieman, J. Software Metrics: A Rigorous and Practical Approach; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
- Ali, M.; Huda, S.; Alyahya, S.; Yearwood, J.; Abawajy, J.H.; Al-Dossari, H. A parallel framework for software defect detection and metric selection on cloud computing. Clust. Comput. 2017, 20, 2267–2281. [Google Scholar] [CrossRef]
- Prasad, M.; Florence, L.F.; Arya, A. A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques. Int. J. Database Theory Appl. 2015, 8, 179–190. [Google Scholar] [CrossRef] [Green Version]
- Shepperd, M.; Song, Q.; Sun, Z.; Mair, C. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Trans. Softw. Eng. 2013, 39, 1208–1215. [Google Scholar] [CrossRef] [Green Version]
- Wu, R.; Zhang, H.; Kim, S.; Cheung, S.-C. Relink: Recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering; ACM: Szeged, Hungary, 2011; pp. 15–25. [Google Scholar]
- Song, Q.; Guo, Y.; Shepperd, M. A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction. IEEE Trans. Softw. Eng. 2019, 45, 1253–1269. [Google Scholar] [CrossRef] [Green Version]
- Nam, J.; Fu, W.; Kim, S.; Menzies, T.; Tan, L. Heterogeneous Defect Prediction. IEEE Trans. Softw. Eng. 2017, 44, 874–896. [Google Scholar] [CrossRef] [Green Version]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2013; Volume 103. [Google Scholar]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: Berlin/Hedielberg, Germany, 2013. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Rao, R.B.; Fung, G.M.; Rosales, R. On the Dangers of Cross-Validation. An Experimental Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining; Society for Industrial & Applied Mathematics (SIAM); SIAM: Atlanta, GA, USA, 2008; pp. 588–596. [Google Scholar]
- Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. An Empirical Comparison of Model Validation Techniques for Defect Prediction Models. IEEE Trans. Softw. Eng. 2017, 43, 1–18. [Google Scholar] [CrossRef]
S/No | Authors | Feature Selection Methods | Datasets | Prediction Model | Findings | |||
---|---|---|---|---|---|---|---|---|
FFR | FSS | Wrapper | Other FS Methods | |||||
1 | Ghotra, et al. [25] | 1 filter rank methods | Six filter subset methods (CFS and CNS with three search methods) | 12 wrapper methods | None | 18 datasets (NASA and Promise) | 21 classification algorithms | CFS with best first search is the best |
2 | Xu, et al. [28] | 14 filter rank methods | 2 Filter Subset methods (CFS and CNS) | 12 wrapper methods | Cluster-based methods and PCA | 16 Datasets (NASA and AEEEM) | One classification algorithm (RF) | FSS and wrapper achieve the best performance |
3 | Shivaji, et al. [34] | Four filter rank methods | None | Two wrapper methods | None | 11 software projects | Two classification algorithms (NB and SVM) | |
4 | Gao, et al. [10] | Seven filter Rank Methods | Four filter subset methods (CFS and CNS with three search methods) | None | None | Private dataset | Five classification algorithms (NB, MLP, SVM, KNN, and LR) | No significant difference among the 10 methods |
5 | Rodriguez, et al. [24] | None | Two filter subset methods (CFS and CNS with three search methods) | None | None | Five datasets | Two classification algorithms (DT and NB) | Wrapper achieve the best performance |
6 | Rathore and Gupta [27] | Seven filter rank methods | Eight filter subset methods (CFS, ClassifierSubsetval, FilterSubsetEval, WrapperSubsetEval with four search methods) | None | None | 14 datasets (promise) | Two classification algorithms (RF, NB) | FFR is best |
7 | Afzal and Torkar [20] | Two filter rank methods | CNS, CFS | One wrapper methods | Genetic programming, evolutionary computation method | Five datasets | Two classification algorithms (DT, NB) | FSS is best |
8 | Balogun, et al. [30] | Four filter rank methods | 14 filter subset methods (CFS and CNS with seven search methods) | None | None | NASA | Four classifiers (LR, NB, DT, KNN) | No significant difference, but FFR is more stable based on performance accuracy values (coefficient of variation) |
9 | Kondo, et al. [2] | None | None | None | Eight feature reduction techniques | Promise, NASA, AEEM | Supervised (DT, LR, NB, RF, LMT); Unsupervised (SC, PAM, KM, FCM, NG) | NN-based methods (RBM, AE) are better |
10 | Muthukumaran, et al. [26] | Seven filter rank methods | CFS | Two wrapper methods based on greedy search method | None | NASA, AEEEM | NB, LR, RF | Wrapper method based on greedy search performed best |
Filter Feature Rank Methods | Characteristics | Label Name | Reference |
---|---|---|---|
Chi-Squared Filter (CS) | Statistics-based | FFR1 | [10,25,27,28,30] |
Correlation Filter (CO) | FFR2 | ||
Cross-Validation Filter (CV) | FFR3 | ||
Information Gain Filter (IG) | Probability-based | FFR4 | [10,25,27,28,30] |
Gain Ratio Attribute Filter (GR) | FFR5 | ||
Symmetrical Uncertainty Filter (SU) | FFR6 | ||
Probability Significance Filter (PS) | FFR7 | ||
Relief Feature Filter (RFA) | Instance-based | FFR8 | [25,26,28] |
Weighted Relief Feature Filter (WRF) | FFR9 | ||
One-Rule Filter (OR) | Classifier-based | FFR10 | [25,26,28] |
SVM Filter (SVM) | FFR11 | ||
Simplified Silhouette Filter (SSF) | Cluster-based | FFR12 | |
Targeted Projection Pursuit Filter (TPP) | Projection-based | FFR13 |
Filter-Feature Subset Selection Methods | Search Methods | Label Name |
---|---|---|
Correlation-based feature subset selection (CFS) | Ant Search (AS) | FSS1 |
Bee Search (BS) | FSS2 | |
Bat Search (BAT) | FSS3 | |
Cuckoo Search (CS) | FSS4 | |
Elephant Search (ES) | FSS5 | |
Firefly Search (FS) | FSS6 | |
Flower Search (FLS) | FSS7 | |
Genetic Search (GS) | FSS8 | |
Non-Sorted Genetic Algorithm II (NSGA-II) | FSS9 | |
PSO Search (PSOS) | FSS10 | |
Rhinoceros Search (RS) | FSS11 | |
Wolf Search (WS) | FSS12 | |
Best First Search (BFS) | FSS13 | |
Consistency feature subset selection (CNS) | Ant Search (AS) | FSS14 |
Bee Search (BS) | FSS15 | |
Bat Search (BAT) | FSS16 | |
Cuckoo Search (CS) | FSS17 | |
Elephant Search (ES) | FSS18 | |
Firefly Search (FS) | FSS19 | |
Flower Search (FLS) | FSS20 | |
Genetic Search (GS) | FSS21 | |
Non-Sorted Genetic Algorithm II (NSGA-II) | FSS22 | |
PSO Search (PSOS) | FSS23 | |
Rhinoceros Search (RS) | FSS24 | |
Wolf Search (WS) | FSS25 | |
Best First Search (BFS) | FSS26 |
Wrapper-Based Feature Selection (WFS) Methods | Search Methods | Evaluation Criteria | Label Name |
---|---|---|---|
Wrapper-based feature selection based on naïve Bayes | Incremental wrapper subset selection (IWSS) | Accuracy and AUC | WFS1 |
Subset-size forward selection (SFS) | WFS2 | ||
Linear forward search (LFS) | WFS3 | ||
Wrapper-based feature selection based on decision tree | Incremental wrapper subset selection (IWSS) | Accuracy and AUC | WFS4 |
Subset-size forward selection (SFS) | WFS5 | ||
Linear forward search (LFS) | WFS6 |
Classification Algorithm | Classifier Type | Parameter Setting |
---|---|---|
Naïve Bayes (NB) | A probability-based classifier. | NumDecimalPlaces = 2; UseKernelEstimator = True |
Decision Tree (DT) | An information entropy-based classifier. | Confidence factor = 0.25; MinNumObj = 2 |
Datasets | # of Features | # of Modules | Repository |
---|---|---|---|
EQ | 62 | 324 | AEEEM |
JDT | 62 | 997 | |
ML | 62 | 1862 | |
PDE | 62 | 1497 | |
CM1 | 38 | 327 | NASA |
KC1 | 22 | 1162 | |
KC2 | 22 | 522 | |
KC3 | 40 | 194 | |
MW1 | 38 | 250 | |
PC1 | 38 | 679 | |
PC3 | 38 | 1077 | |
PC4 | 38 | 1287 | |
PC5 | 39 | 1711 | |
ANT | 22 | 292 | PROMISE |
CAMEL | 21 | 339 | |
JEDIT | 22 | 312 | |
REDKITOR | 21 | 176 | |
TOMCAT | 22 | 852 | |
VELOCITY | 21 | 196 | |
XALAN | 22 | 797 | |
SAFE | 27 | 56 | ReLink |
ZXING | 27 | 399 | |
APACHE | 27 | 194 | |
ECLIPSE | 19 | 1065 | |
SWT | 18 | 1485 |
Statistical Ranking Based on Average Accuracy | Statistical Ranking Based on Average AUC | ||||||
---|---|---|---|---|---|---|---|
Naïve Bayes | Decision Tree | Naïve Bayes | Decision Tree | ||||
Rank | FFR Methods | Rank | FFR Method | Rank | FFR Methods | Rank | FFR Method |
1 | FFR11 | 1 | FFR1, FFR2 | 1 | FFR1, FFR2, FFR6 | 1 | FFR1, FFR2, FFR7, FFR9 |
2 | FFR9, FFR2 | 2 | FFR4, FFR5, FFR6, FFR7, FFR10, FFR11 | 2 | FFR4, FFR6 | 2 | FFR6 |
3 | FFR8, FFR6, FFR5 | 3 | FFR8, FFR9 | 3 | FFR8, FFR9 | 3 | FFR11 |
4 | FFR13, FFR12, FFR10, FFR7, FFR4, FFR1 | 4 | FFR12, FFR13 | 4 | FFR5, FFR10, FFR11 | 4 | FFR4, FFR5 |
5 | FFR3 | 5 | FFR3 | 5 | FFR12 | 5 | FFR8, FFR10 |
6 | FFR13 | 6 | FFR12 | ||||
7 | FFR3 | 7 | FFR13 | ||||
8 | FFR3 |
Statistical Ranking Based on Average Accuracy | Statistical Ranking Based on Average AUC | ||||||
---|---|---|---|---|---|---|---|
Naïve Bayes | Decision Tree | Naïve Bayes | Decision Tree | ||||
Rank | FSS Methods | Rank | FSS Method | Rank | FSS Methods | Rank | FSS Method |
1 | FSS1, FSS2, FSS3, FSS4, FSS6, FSS7, FSS8, FSS9, FSS10, FSS11, FSS12 | 1 | FSS1, FSS3, FSS5, FSS6, FSS7, FSS8, FSS8, FSS9, FSS10 | 1 | FSS1, FSS3, FSS4, FSS5, FSS6, FSS7, FSS8, FSS10, FSS11, FSS12 | 1 | FSS10 |
2 | FSS5, FSS13 | 2 | FSS2, FSS4, FSS11, FSS12, FSS13, FSS15, FSS16, FSS19 | 2 | FSS2, FSS9, FSS13, FSS18 | 2 | FSS2, FSS11, FSS12 |
3 | FSS26 | 3 | FSS17, FSS21, FSS22, FSS26 | 3 | FSS14, FSS20 | 3 | FSS3, FSS5, FSS7, FSS9 |
4 | FSS14, FSS15, FSS16, FSS17, FSS18, FSS19, FSS29, FSS21, FSS22, FSS23, FSS24, FSS25, | 4 | FSS14, FSS18, FSS20, FSS23, FSS24, FSS25 | 4 | FSS15, FSS26, FSS19, FSS21, FSS22, FSS23, FSS24, FSS25, FSS26 | 4 | FSS1, FSS4, FSS6, FSS13 |
5 | FSS17 | 5 | FSS8, FSS15, FSS24 | ||||
6 | FSS21, FSS22, FSS26, FSS20 | ||||||
7 | FSS14, FSS16 | ||||||
8 | FSS17, FSS18, FSS19 | ||||||
9 | FSS23, FSS25 |
Statistical Ranking Based on Average Accuracy | Statistical Ranking Based on Average AUC | ||||||
---|---|---|---|---|---|---|---|
Naïve Bayes | Decision Tree | Naïve Bayes | Decision Tree | ||||
Rank | WFS Methods | Rank | WFS Method | Rank | WFS Methods | Rank | WFS Method |
1 | WFS1 | 1 | WFS4 | 1 | WFS1, WFS2 | 1 | WFS4 |
2 | WFS2 | 2 | WFS6 | 2 | WFS3 | 2 | WFS6 |
3 | WFS3 | 3 | WFS5 | 3 | WFS5 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Balogun, A.O.; Basri, S.; Mahamad, S.; Abdulkadir, S.J.; Almomani, M.A.; Adeyemo, V.E.; Al-Tashi, Q.; Mojeed, H.A.; Imam, A.A.; Bajeh, A.O. Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study. Symmetry 2020, 12, 1147. https://doi.org/10.3390/sym12071147
Balogun AO, Basri S, Mahamad S, Abdulkadir SJ, Almomani MA, Adeyemo VE, Al-Tashi Q, Mojeed HA, Imam AA, Bajeh AO. Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study. Symmetry. 2020; 12(7):1147. https://doi.org/10.3390/sym12071147
Chicago/Turabian StyleBalogun, Abdullateef O., Shuib Basri, Saipunidzam Mahamad, Said J. Abdulkadir, Malek A. Almomani, Victor E. Adeyemo, Qasem Al-Tashi, Hammed A. Mojeed, Abdullahi A. Imam, and Amos O. Bajeh. 2020. "Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study" Symmetry 12, no. 7: 1147. https://doi.org/10.3390/sym12071147
APA StyleBalogun, A. O., Basri, S., Mahamad, S., Abdulkadir, S. J., Almomani, M. A., Adeyemo, V. E., Al-Tashi, Q., Mojeed, H. A., Imam, A. A., & Bajeh, A. O. (2020). Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study. Symmetry, 12(7), 1147. https://doi.org/10.3390/sym12071147