[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A decision tree logic based recommendation system to select software fault prediction techniques

Published: 01 March 2017 Publication History

Abstract

Identifying a reliable fault prediction technique is the key requirement for building effective fault prediction model. It has been found that the performance of fault prediction techniques is highly dependent on the characteristics of the fault dataset. To mitigate this issue, researchers have evaluated and compared a plethora of fault prediction techniques by varying the context in terms of domain information, characteristics of input data, complexity, etc. However, the lack of an accepted benchmark makes it difficult to select fault prediction technique for a particular context of prediction. In this paper, we present a recommendation system that facilitates the selection of appropriate technique(s) to build fault prediction model. First, we have reviewed the literature to elicit the various characteristics of the fault dataset and the appropriateness of the machine learning and statistical techniques for the identified characteristics. Subsequently, we have formalized our findings and built a recommendation system that helps in the selection of fault prediction techniques. We performed an initial appraisal of our presented system and found that proposed recommendation system provides useful hints in the selection of the fault prediction techniques.

References

[1]
Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton
[2]
Zimmermann T, Nagappan N, Zeller A (2008) Predicting bugs from history. Softw Evol J. Springer, Berlin, pp 69-88
[3]
Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng J
[4]
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675-689
[5]
Briand LC, Daly JW, Wust J (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65-117
[6]
Alshayeb M, Li W (2003) An empirical validation of object-oriented metrics in two different iterative software processes. IEEE Trans Softw Eng 29(11):1043-1049
[7]
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111-122
[8]
Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction based on support vector machine. In: Proceeding of 16th IEEE international symposium on software reliability engineering, pp 10-19
[9]
Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings of 8th international symposium on software reliability engineering, pp 27-35
[10]
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proceeding of 15th international symposium on software reliability engineering, pp 417-428
[11]
Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci J 179(8):1040-1058
[12]
Challagulla UV, Bastani FB, Yen IL (2006) A unified framework for defect data analysis using the mbr technique. In: Proceeding of 18th IEEE international conference on tools with artificial intelligence, pp 39-46
[13]
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561-595
[14]
Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1806-1817
[15]
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485-496
[16]
Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823-839
[17]
Dejaeger K, Verbraken T, Baesens B (2013) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Softw Eng 39(2):237-257
[18]
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483-492
[19]
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the ESEC and FSE, pp 91-100
[20]
Pickard L, Kitchenham B, Linkman S (1999) An investigation of analysis techniques for software datasets. In: Proceedings of 6th international software metrics symposium, pp 130-142
[21]
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
[22]
Martinez J, Fuentes O (2005) Using c4.5 as variable selection criterion in classification tasks. In: Proceedings of the 9th international conference on artificial intelligence and soft computings. Benidrom, Spain
[23]
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of emerging artificial intelligence applications in computer engineering, pp 3-24
[24]
Fitzpatrick JM, Grefenstette JJ (1988) Genetic algorithms in noisy environments. Mach Learn 3(2-3):101-120
[25]
Rokach L (2005) Ensemble methods for classifiers. In: Data mining and knowledge discovery handbook. Springer, Berlin, pp 957-980
[26]
Xuan L, Zhigang C, Fan Y (2013) Exploring of clustering algorithm on class-imbalanced data. In: Proceeding of 8th international conference on computer science and education. IEEE, New York, pp 89-93
[27]
Manago M, Kodratoff Y (1987) Noise and knowledge acquisition. In: IJCAI, pp 348-354
[28]
Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579-606
[29]
Rodriguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J, Garre M (2007) Attribute selection in software engineering datasets for detecting fault modules. In: Proceedings of 33rd EUROMICRO conference on software engineering and advanced applications, pp 418-423
[30]
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653-661
[31]
Charu C (2013) Aggarwal. Outlier analysis. Springer Science and Business Media, Berlin
[32]
Moreno-Torres JG, Raeder T, Alaiz-Rodriguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521-530
[33]
Calikli G, Bener A (2013) An algorithmic approach to missing data problem in modeling human aspects in software development. In: Proceedings of 9th international conference on predictive models in software engineering. ACM, New York
[34]
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceeding of international conference on software engineering
[35]
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 33rd international conference on software engineering, pp 481-490
[36]
Grbac T, Mausa G, Basic BD (2013) Stability of software defect prediction in relation to levels of data imbalance. In: SQAMIA, pp 1-10
[37]
Vu B, Challagulla FB, Bastani IL, Paul RA (2008) Empirical assessment of machine learning based software defect prediction techniques. Int J Artif Intell Tools 17(02):389-400
[38]
Succi G, Pedrycz W, Djokic S, Zuliani P, Russo B (2005) An empirical exploration of the distributions of the chidamber and kemerer object-oriented metrics suite. Empir Softw Eng 10(1):81-104
[39]
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8(1):87-102
[40]
Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117-128
[41]
Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceeding of 2011 international joint conference on neural networks, pp 2690-2697
[42]
Sharpe PK, Solly RJ (1995) Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 3(2):73-77
[43]
Venkatesh S, Gopal S (2011) Robust heteroscedastic probabilistic neural network for multiple source partial discharge pattern recognition-significance of outliers on classification capability. Exp Syst Appl 38(9):11501-11514
[44]
Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York
[45]
Allison PD (2001) Missing data, vol 136. Sage Publications, Chennai
[46]
Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis
[47]
Afzal W, Torkar R, Feldt R (2008) Prediction of fault count data using genetic programming. In: Proceeding of international multitopic conference, pp 349-356
[48]
Fonseca CM, Fleming PJ (1993) Multiobjective genetic algorithms. In: IEE colloquium on genetic algorithms for control systems engineering. IET, Thiruvananthapuram, pp 1-6
[49]
Li F, Li H (2012) Svm classification for large data sets by support vector estimating and selecting. In: Recent advances in computer science and information engineering. Springer, Berlin, pp 775-781
[50]
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83-85
[51]
Debruyne M (2009) An outlier map for support vector machine classification. Ann Appl Stat 1566-1580
[52]
Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8(3):255-283
[53]
Mauvsa G, Grbac TG, Bavsic BD (2012) Multivariate logistic regression prediction of fault-proneness in software modules. In: Proceedings of the 35th international convention, pp 698-703
[54]
Ratanamahatana CA, Gunopulos D (2002) Scaling up the naive bayesian classifier: using decision trees for feature selection
[55]
Briand L, Devanbu P, Melo W (1997) An investigation into coupling measures for c++. In: Proceedings of 19th international conference on software engineering, pp 412-421
[56]
Ghimire B, Rogan J, Galiano VR, Panday P, Neeti N (2012) An evaluation of bagging, boosting, and random forests for land-cover classification in cape cod, massachusetts, usa. GISci Remote Sens 49(5):623-643
[57]
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases, VLDB '00, pp 506-515
[58]
Jiamthapthaksin R, Eick CF, Vilalta R (2009) A framework for multi-objective clustering and its application to colocation mining. In: Advanced data mining and applications. Springer, Berlin, pp 188-199
[59]
Acuna E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Classification, clustering, and data mining applications. Springer, Berlin, pp 639-647
[60]
Amatriain X, Jaimes A, Oliver N, Pujol JM (2011) Data mining methods for recommender systems. In: Recommender systems handbook. Springer, Berlin, pp 39-71
[61]
Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. In: Advances in machine learning application in software engineering. Idea Group Inc, Calgary, pp 237-265
[62]
Karimi K, Hamilton HJ (2002) Timesleuth: a tool for discovering causal and temporal rules. In: Proceedings of 14th IEEE international conference on tools with artificial intelligence. IEEE, New York, pp 375-380
[63]
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491-502
[64]
Law HCM, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2, pp II-424
[65]
Mitchell TM (1997) Machine learning, vol 1. McGraw-Hill, USA
[66]
Owoc ML, Galant V (1999) Validation of rule-based systems generated by classification algorithms. In: Evolution and challenges in system development. Springer, Berlin, pp 459-467
[67]
Khoshgoftaar TM, Seliya N (2002) Tree-based software quality estimation models for fault prediction. In: Proceedings of the eighth IEEE symposium on software metrics. IEEE, New York, pp 203-214
[68]
Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: Proceeding of 18th IEEE international symposium on software reliability, pp 215-224
[69]
Elish OK, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649-660
[70]
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276-1304
[71]
Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. PhD thesis, Queens University
[72]
Nam J (2014) Survey on software defect prediction. PhD Thesis, Hong Kong University of Science and Technology

Cited By

View all
  1. A decision tree logic based recommendation system to select software fault prediction techniques

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computing
      Computing  Volume 99, Issue 3
      March 2017
      107 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 March 2017

      Author Tags

      1. 68N30 Mathematical aspects of software engineering (specification
      2. Decision tree
      3. Recommendation system
      4. Software fault prediction
      5. Software fault prediction techniques
      6. etc.)
      7. metrics
      8. requirements
      9. verification

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Predicting and Classifying Software FaultsProceedings of the 7th International Conference on Computer and Communications Management10.1145/3348445.3348453(143-147)Online publication date: 27-Jul-2019
      • (2019)A study on software fault prediction techniquesArtificial Intelligence Review10.1007/s10462-017-9563-551:2(255-327)Online publication date: 1-Feb-2019
      • (2019)Learning performance prediction via convolutional GRU and explainable neural networks in e-learning environmentsComputing10.1007/s00607-018-00699-9101:6(587-604)Online publication date: 1-Jun-2019
      • (2018)A review on the dynamics of social recommender systemsInternational Journal of Web Engineering and Technology10.5555/3292941.329294413:3(255-276)Online publication date: 1-Jan-2018
      • (2017)Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systemsKnowledge-Based Systems10.1016/j.knosys.2016.12.017119:C(232-256)Online publication date: 1-Mar-2017
      • (2017)Towards an ensemble based system for predicting the number of software faultsExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.04.01482:C(357-382)Online publication date: 1-Oct-2017

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media