More Web Proxy on the site http://driver.im/

article

A decision tree logic based recommendation system to select software fault prediction techniques

Authors:

Santosh S. Rathore,

Sandeep KumarAuthors Info & Claims

Computing, Volume 99, Issue 3

Pages 255 - 285

Published: 01 March 2017 Publication History

Abstract

Identifying a reliable fault prediction technique is the key requirement for building effective fault prediction model. It has been found that the performance of fault prediction techniques is highly dependent on the characteristics of the fault dataset. To mitigate this issue, researchers have evaluated and compared a plethora of fault prediction techniques by varying the context in terms of domain information, characteristics of input data, complexity, etc. However, the lack of an accepted benchmark makes it difficult to select fault prediction technique for a particular context of prediction. In this paper, we present a recommendation system that facilitates the selection of appropriate technique(s) to build fault prediction model. First, we have reviewed the literature to elicit the various characteristics of the fault dataset and the appropriateness of the machine learning and statistical techniques for the identified characteristics. Subsequently, we have formalized our findings and built a recommendation system that helps in the selection of fault prediction techniques. We performed an initial appraisal of our presented system and found that proposed recommendation system provides useful hints in the selection of the fault prediction techniques.

References

[1]

Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton

Digital Library

[2]

Zimmermann T, Nagappan N, Zeller A (2008) Predicting bugs from history. Softw Evol J. Springer, Berlin, pp 69-88

[3]

Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng J

Digital Library

[4]

Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675-689

Digital Library

[5]

Briand LC, Daly JW, Wust J (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65-117

Digital Library

[6]

Alshayeb M, Li W (2003) An empirical validation of object-oriented metrics in two different iterative software processes. IEEE Trans Softw Eng 29(11):1043-1049

Digital Library

[7]

Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111-122

Digital Library

[8]

Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction based on support vector machine. In: Proceeding of 16th IEEE international symposium on software reliability engineering, pp 10-19

Digital Library

[9]

Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings of 8th international symposium on software reliability engineering, pp 27-35

Digital Library

[10]

Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proceeding of 15th international symposium on software reliability engineering, pp 417-428

Digital Library

[11]

Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci J 179(8):1040-1058

Digital Library

[12]

Challagulla UV, Bastani FB, Yen IL (2006) A unified framework for defect data analysis using the mbr technique. In: Proceeding of 18th IEEE international conference on tools with artificial intelligence, pp 39-46

Digital Library

[13]

Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561-595

Digital Library

[14]

Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1806-1817

Digital Library

[15]

Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485-496

Digital Library

[16]

Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823-839

Digital Library

[17]

Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Softw Eng 39(2):237-257

Digital Library

[18]

Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483-492

Digital Library

[19]

Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the ESEC and FSE, pp 91-100

Digital Library

[20]

Pickard L, Kitchenham B, Linkman S (1999) An investigation of analysis techniques for software datasets. In: Proceedings of 6th international software metrics symposium, pp 130-142

Digital Library

[21]

Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

Digital Library

[22]

Martinez J, Fuentes O (2005) Using c4.5 as variable selection criterion in classification tasks. In: Proceedings of the 9th international conference on artificial intelligence and soft computings. Benidrom, Spain

[23]

Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of emerging artificial intelligence applications in computer engineering, pp 3-24

Digital Library

[24]

Fitzpatrick JM, Grefenstette JJ (1988) Genetic algorithms in noisy environments. Mach Learn 3(2-3):101-120

Digital Library

[25]

Rokach L (2005) Ensemble methods for classifiers. In: Data mining and knowledge discovery handbook. Springer, Berlin, pp 957-980

[26]

Xuan L, Zhigang C, Fan Y (2013) Exploring of clustering algorithm on class-imbalanced data. In: Proceeding of 8th international conference on computer science and education. IEEE, New York, pp 89-93

[27]

Manago M, Kodratoff Y (1987) Noise and knowledge acquisition. In: IJCAI, pp 348-354

Digital Library

[28]

Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579-606

Digital Library

[29]

Rodriguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J, Garre M (2007) Attribute selection in software engineering datasets for detecting fault modules. In: Proceedings of 33rd EUROMICRO conference on software engineering and advanced applications, pp 418-423

Digital Library

[30]

Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653-661

Digital Library

[31]

Charu C (2013) Aggarwal. Outlier analysis. Springer Science and Business Media, Berlin

Digital Library

[32]

Moreno-Torres JG, Raeder T, Alaiz-Rodriguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521-530

Digital Library

[33]

Calikli G, Bener A (2013) An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th international conference on predictive models in software engineering. ACM, New York

Digital Library

[34]

Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceeding of international conference on software engineering

Digital Library

[35]

Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering, pp 481-490

Digital Library

[36]

Grbac T, Mausa G, Basic BD (2013) Stability of software defect prediction in relation to levels of data imbalance. In: SQAMIA, pp 1-10

[37]

Vu B, Challagulla FB, Bastani IL, Paul RA (2008) Empirical assessment of machine learning based software defect prediction techniques. Int J Artif Intell Tools 17(02):389-400

[38]

Succi G, Pedrycz W, Djokic S, Zuliani P, Russo B (2005) An empirical exploration of the distributions of the chidamber and kemerer object-oriented metrics suite. Empir Softw Eng 10(1):81-104

Digital Library

[39]

Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87-102

Digital Library

[40]

Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117-128

Digital Library

[41]

Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceeding of 2011 international joint conference on neural networks, pp 2690-2697

[42]

Sharpe PK, Solly RJ (1995) Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 3(2):73-77

[43]

Venkatesh S, Gopal S (2011) Robust heteroscedastic probabilistic neural network for multiple source partial discharge pattern recognition-significance of outliers on classification capability. Exp Syst Appl 38(9):11501-11514

Digital Library

[44]

Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York

Digital Library

[45]

Allison PD (2001) Missing data, vol 136. Sage Publications, Chennai

[46]

Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis

Digital Library

[47]

Afzal W, Torkar R, Feldt R (2008) Prediction of fault count data using genetic programming. In: Proceeding of international multitopic conference, pp 349-356

[48]

Fonseca CM, Fleming PJ (1993) Multiobjective genetic algorithms. In: IEE colloquium on genetic algorithms for control systems engineering. IET, Thiruvananthapuram, pp 1-6

[49]

Li F, Li H (2012) Svm classification for large data sets by support vector estimating and selecting. In: Recent advances in computer science and information engineering. Springer, Berlin, pp 775-781

[50]

Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83-85

[51]

Debruyne M (2009) An outlier map for support vector machine classification. Ann Appl Stat 1566-1580

[52]

Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8(3):255-283

Digital Library

[53]

Mauvsa G, Grbac TG, Bavsic BD (2012) Multivariate logistic regression prediction of fault-proneness in software modules. In: Proceedings of the 35th international convention, pp 698-703

[54]

Ratanamahatana CA, Gunopulos D (2002) Scaling up the naive bayesian classifier: using decision trees for feature selection

[55]

Briand L, Devanbu P, Melo W (1997) An investigation into coupling measures for c++. In: Proceedings of 19th international conference on software engineering, pp 412-421

Digital Library

[56]

Ghimire B, Rogan J, Galiano VR, Panday P, Neeti N (2012) An evaluation of bagging, boosting, and random forests for land-cover classification in cape cod, massachusetts, usa. GISci Remote Sens 49(5):623-643

[57]

Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases, VLDB '00, pp 506-515

Digital Library

[58]

Jiamthapthaksin R, Eick CF, Vilalta R (2009) A framework for multi-objective clustering and its application to colocation mining. In: Advanced data mining and applications. Springer, Berlin, pp 188-199

Digital Library

[59]

Acuna E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Classification, clustering, and data mining applications. Springer, Berlin, pp 639-647

[60]

Amatriain X, Jaimes A, Oliver N, Pujol JM (2011) Data mining methods for recommender systems. In: Recommender systems handbook. Springer, Berlin, pp 39-71

[61]

Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. In: Advances in machine learning application in software engineering. Idea Group Inc, Calgary, pp 237-265

[62]

Karimi K, Hamilton HJ (2002) Timesleuth: a tool for discovering causal and temporal rules. In: Proceedings of 14th IEEE international conference on tools with artificial intelligence. IEEE, New York, pp 375-380

Digital Library

[63]

Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491-502

Digital Library

[64]

Law HCM, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2, pp II-424

Digital Library

[65]

Mitchell TM (1997) Machine learning, vol 1. McGraw-Hill, USA

Digital Library

[66]

Owoc ML, Galant V (1999) Validation of rule-based systems generated by classification algorithms. In: Evolution and challenges in system development. Springer, Berlin, pp 459-467

[67]

Khoshgoftaar TM, Seliya N (2002) Tree-based software quality estimation models for fault prediction. In: Proceedings of the eighth IEEE symposium on software metrics. IEEE, New York, pp 203-214

Digital Library

[68]

Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: Proceeding of 18th IEEE international symposium on software reliability, pp 215-224

Digital Library

[69]

Elish OK, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649-660

Digital Library

[70]

Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276-1304

Digital Library

[71]

Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. PhD thesis, Queens University

[72]

Nam J (2014) Survey on software defect prediction. PhD Thesis, Hong Kong University of Science and Technology

Cited By

Cynthia SRipon S(2019)Predicting and Classifying Software FaultsProceedings of the 7th International Conference on Computer and Communications Management10.1145/3348445.3348453(143-147)Online publication date: 27-Jul-2019
https://dl.acm.org/doi/10.1145/3348445.3348453
Rathore SKumar S(2019)A study on software fault prediction techniquesArtificial Intelligence Review10.1007/s10462-017-9563-551:2(255-327)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s10462-017-9563-5
Wang XWu PLiu GHuang QHu XXu H(2019)Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environmentsComputing10.1007/s00607-018-00699-9101:6(587-604)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s00607-018-00699-9
Show More Cited By

A decision tree logic based recommendation system to select software fault prediction techniques
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

A Decision Tree Regression based Approach for the Number of Software Faults Prediction

Software fault prediction is an important activity to make software quality assurance (SQA) process more efficient, economic and targeted. Most of earlier works related to software fault prediction have focused on classifying software modules as faulty ...
A study on software fault prediction techniques

Software fault prediction aims to identify fault-prone software modules by using some underlying properties of the software project before the actual testing process begins. It helps in obtaining desired software quality with optimized cost and effort. ...
Towards an ensemble based system for predicting the number of software faults

Paper presents ensemble based system for the prediction of number of software faults.System is based on the heterogeneous ensemble method.System uses three fault prediction techniques as base learners for the ensemble.Results are verified on Eclipse ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computing

Computing Volume 99, Issue 3

March 2017

107 pages

ISSN:0010-485X

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer-Verlag Wien.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cynthia SRipon S(2019)Predicting and Classifying Software FaultsProceedings of the 7th International Conference on Computer and Communications Management10.1145/3348445.3348453(143-147)Online publication date: 27-Jul-2019
https://dl.acm.org/doi/10.1145/3348445.3348453
Rathore SKumar S(2019)A study on software fault prediction techniquesArtificial Intelligence Review10.1007/s10462-017-9563-551:2(255-327)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s10462-017-9563-5
Wang XWu PLiu GHuang QHu XXu H(2019)Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environmentsComputing10.1007/s00607-018-00699-9101:6(587-604)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s00607-018-00699-9
(2018)A review on the dynamics of social recommender systemsInternational Journal of Web Engineering and Technology10.5555/3292941.329294413:3(255-276)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3292941.3292944
Rathore SKumar S(2017)Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systemsKnowledge-Based Systems10.1016/j.knosys.2016.12.017119:C(232-256)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.knosys.2016.12.017
Rathore SKumar S(2017)Towards an ensemble based system for predicting the number of software faultsExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.04.01482:C(357-382)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.eswa.2017.04.014

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents