Abstract
Presence or absence of defective modules in software is an indicator of quality of the software. Every company aspires to deliver good quality software with minimum number of defective modules. To achieve this goal, defect prediction models are used in different phases of software lifecycle. These models have to deal with a large number software metrics (as input parameters to the models). These metrics have correlation issues that affect a model’s performance. Also, in some cases using all the metrics negatively impacts the models’ performances. In order to reduce size of input space and resolve the possible issues of correlation in input data, models reported in literature use Principal Component Analysis (PCA) and Information Gain (IG) based dimension reduction. PCA reduces the dimensions but keeps the representation of all the input variables intact. Use of PCA is not suitable where representation of all the metrics is declining a model’s performance. To handle such situations, this paper advocates use of Information Gain (IG) based technique to reduce size of input space by dropping the irrelevant metrics. Afterwards, only relevant metrics are used to develop a prediction model. This paper compares the PCA and IC based techniques to develop classification tree and fuzzy inferencing system based models. In order to study the impact of using IG, percentage improvement in Recall, Accuracy and Misclassification Rate have been calculated for the aforementioned models. The results show that use of IG improves the models’ performances more often than PCA does.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altidor, W., Khoshgoftaar, T.M., Van Hulse, J.: An empirical study on wrapper-based feature ranking. In: 21st International Conference on Tools with Artificial Intelligence, ICTAI 2009, pp. 75–82 (November 2009)
Azhagusundari, B., Thanamani, A.S.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (January 2013)
BPB Editorial Board. Data Mining: Data Mining: Typical Data Mining Process for Predictive Modeling, 1st edn. BPB Publications, Connaught Place (2004)
Bouktif, S., Azar, D., Precup, D., Sahraoui, H., Kegl, B.: Improving rule set based software quality prediction: A genetic algorithm-based approach. Journal of Object Technology 3(4), 227–241 (2004)
Briand, L.C., Wst, J., Daly, J.W., Victor Porter, D.: Exploring the relationship between design measures and software quality in object-oriented systems. Journal of Systems and Software 51(3), 245–273 (2000)
Challagulla, V.U.B., Bastani, F.B., Paul, R.A.: Empirical assessment of machine learning based sofwtare defect prediction techniques. In: Proceedings of 10th Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2005), Washington, DC, USA, pp. 263–270. IEEE Computer Society (2005)
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–687 (1999)
Ganesan, K., Khosgoftaar, T.M., Allen, E.B.: Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2), 139–152 (2000)
Gao, K., Khoshgoftaar, T.M.: Software defect prediction for high-dimensional and class-imbalanced data. In: SEKE, pp. 89–94, Knowledge Systems Institute Graduate School (2011)
Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of PROMISE 2008. ACM (May 2008)
Khosgoftaar, T.M., Munson, J.C.: Predicting software development errors using software complexity metrics. IEEE Journal on Selected Areas In Communications 8(2), 253–261 (1990)
Khoshgoftaar, T.M., Allen, E.B.: Predicting fault-prone software modules in embedded systems with classification trees. In: Proceedings of the 4th IEEE International Symposium on High-Assurance Systems Engineering. IEEE Computer Society (1999)
Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., Goel, N.: Early quality prediction: A case studv in telecommunications. IEEE Software Early Quality Prediction: A Case Studv in Telecommunications 13(1), 65–71 (1996)
Khoshgoftaar, T.M., Cukic, B., Seliya, N.: Predicting fault-prone modules in embedded systems using analogy-based classification models. International Journal of Software Engineering and Knowledge Engineering 12, 201–221 (2002)
Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)
Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data (June 2012)
Menzies, T., Di Stefano, J.S., Chapman, M.: Learning early lifecycle ivv quality indicators. In: Proceedings of IEEE Metrics 2003. IEEE (2003)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Pacharaney, U.S., Salankar, P.S., Mandalapu, S.: Dimensionality reduction for fast and accurate video search and retrieval in a large scale database. In: 2013 Nirma University International Conference on Engineering (NUiCONE), pp. 1–9 (November 2013)
Palghamol, T.N., Metkar, S.P.: Constant dimensionality reduction for large databases using localized pca with an application to face recognition. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP), pp. 560–565 (December 2013)
Rana, Z.A., Awais, M.M., Shamail, S.: An FIS for early detection of defect prone modules. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755, pp. 144–153. Springer, Heidelberg (2009)
Rana, Z.A., Shamail, S., Awais, M.M.: Ineffectiveness of use of software science metrics as predictors of defects in object oriented software. In: WCSE 2009: Proceedings of the 2009 WRI World Congress on Software Engineering, May 19-21, pp. 3–7. IEEE Computer Society, Washington, DC (2009)
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)
Roobaert, D., Karakoulas, G., Chawla, N.V.: Information Gain, Correlation and Support Vector Machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)
Seliya, N., Khoshgoftaar, T.M.: Software quality estimation with limited fault data: A semi-supervised learning perspective. Software Quality Journal 15, 327–344 (2007)
Wang, Q., Zhu, J., Yu, B.: Extract rules from software quality prediction model based on neural network. In: Proceedings of the 11th International Conference on Evaluation and Assessment in Software Engineering, EASE (April 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Rana, Z.A., Awais, M.M., Shamail, S. (2014). Impact of Using Information Gain in Software Defect Prediction Models. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theory. ICIC 2014. Lecture Notes in Computer Science, vol 8588. Springer, Cham. https://doi.org/10.1007/978-3-319-09333-8_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-09333-8_69
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09332-1
Online ISBN: 978-3-319-09333-8
eBook Packages: Computer ScienceComputer Science (R0)