Abstract
This paper reports an empirical study that uses clustering techniques to derive segmented models from software engineering repositories, focusing on the improvement of the accuracy of estimates. In particular, we used two datasets obtained from the International Software Benchmarking Standards Group (ISBSG) repository and created clusters using the M5 algorithm. Each cluster is associated with a linear model. We then compare the accuracy of the estimates so generated with the classical multivariate linear regression and least median squares. Results show that there is an improvement in the accuracy of the results when using clustering. Furthermore, these techniques can help us to understand the datasets better; such techniques provide some advantages to project managers while keeping the estimation process within reasonable complexity.
Chapter PDF
Similar content being viewed by others
References
Aguilar–Ruiz, J.S., Riquelme, J.C., Ramos, I., Toro, M.: An evolutionary approach to estimating software development projects. Information and Software Technology 14(43), 875–882 (2001)
Boehm, B.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, New York (1984)
Conte, S.D., Dunsmore, H.E., Shen, V.: Software Engineering Metrics and Models, Benjamin/Cummings (1986)
Dreger, J.: Function Point Analysis. Prentice Hall, Englewood Cliffs (1989)
Dolado, J.J.: On the problem of the software cost function. Information and Software Technology 43, 61–72 (2001)
Fayyad, U.M., Irani, K.B.: Multi-interval discretisation of continuous valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco (1993)
Finnie, G.R., Wittig, G.E., Desharnais, J.-M.: A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models. Journal of Systems and Software 39(3), 281–289 (2000)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine learning 11, 63–91 (1993)
ISBSG, International Software Benchmarking Standards Group (ISBSG), Web site (2004), http://www.isbsg.org/
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
NESMA, NESMA FPA. Counting Practice Manual Version 2.0 (1996)
PRICE, Price S. (2005), Web Site http://www.pricesystems.com/
Quinlan, J.R.: Learning with continuous class. In: Proc. of the 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Rousseeuw, P.J., Annick, M.L.: Robust Regression and Outlier Detection. John Wiley & Sons, New York (1987)
Shepperd, M., Schofield, C.: Estimating software project effort using analogies. IEEE Transactions on Software Engineering 23(12), 736–743 (2000)
Srinivasan, K., Fisher, D.: Machine Learning Approaches to Estimating Software Development Effort. IEEE Transactions on Software Engineering 21(2), 126–137 (1995)
Walkerden, F., Jeffery, R.: An empirical study of analogy-based software effort estimation. Empirical Software Engineering 42, 135–158 (1999)
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Proceedings of the poster papers of the European Conference on Machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague
Witten, I., Frank, E.: Data Mining Practical: Machine Learning Tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rodríguez, D., Cuadrado, J.J., Sicilia, M.A., Ruiz, R. (2006). Segmentation of Software Engineering Datasets Using the M5 Algorithm. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758549_106
Download citation
DOI: https://doi.org/10.1007/11758549_106
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34385-1
Online ISBN: 978-3-540-34386-8
eBook Packages: Computer ScienceComputer Science (R0)