Segmentation of Software Engineering Datasets Using the M5 Algorithm

D. Rodríguez²⁰,
J. J. Cuadrado²¹,
M. A. Sicilia²¹ &
…
R. Ruiz²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3994))

Included in the following conference series:

International Conference on Computational Science

1902 Accesses

Abstract

This paper reports an empirical study that uses clustering techniques to derive segmented models from software engineering repositories, focusing on the improvement of the accuracy of estimates. In particular, we used two datasets obtained from the International Software Benchmarking Standards Group (ISBSG) repository and created clusters using the M5 algorithm. Each cluster is associated with a linear model. We then compare the accuracy of the estimates so generated with the classical multivariate linear regression and least median squares. Results show that there is an improvement in the accuracy of the results when using clustering. Furthermore, these techniques can help us to understand the datasets better; such techniques provide some advantages to project managers while keeping the estimation process within reasonable complexity.

Download to read the full chapter text

Chapter PDF

Threshold Extraction Framework for Software Metrics

Article 06 September 2019

Toward Applying Agglomerative Hierarchical Clustering in Improving the Software Development Effort Estimation

Analyzing the Effectiveness of the Gaussian Mixture Model Clustering Algorithm in Software Enhancement Effort Estimation

Keywords

References

Aguilar–Ruiz, J.S., Riquelme, J.C., Ramos, I., Toro, M.: An evolutionary approach to estimating software development projects. Information and Software Technology 14(43), 875–882 (2001)
Article Google Scholar
Boehm, B.: Software Engineering Economics. Prentice-Hall, Englewood Cliffs (1981)
MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, New York (1984)
MATH Google Scholar
Conte, S.D., Dunsmore, H.E., Shen, V.: Software Engineering Metrics and Models, Benjamin/Cummings (1986)
Google Scholar
Dreger, J.: Function Point Analysis. Prentice Hall, Englewood Cliffs (1989)
Google Scholar
Dolado, J.J.: On the problem of the software cost function. Information and Software Technology 43, 61–72 (2001)
Article Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretisation of continuous valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Finnie, G.R., Wittig, G.E., Desharnais, J.-M.: A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models. Journal of Systems and Software 39(3), 281–289 (2000)
Article Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine learning 11, 63–91 (1993)
Article MATH Google Scholar
ISBSG, International Software Benchmarking Standards Group (ISBSG), Web site (2004), http://www.isbsg.org/
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
NESMA, NESMA FPA. Counting Practice Manual Version 2.0 (1996)
Google Scholar
PRICE, Price S. (2005), Web Site http://www.pricesystems.com/
Quinlan, J.R.: Learning with continuous class. In: Proc. of the 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Rousseeuw, P.J., Annick, M.L.: Robust Regression and Outlier Detection. John Wiley & Sons, New York (1987)
Book MATH Google Scholar
Shepperd, M., Schofield, C.: Estimating software project effort using analogies. IEEE Transactions on Software Engineering 23(12), 736–743 (2000)
Google Scholar
Srinivasan, K., Fisher, D.: Machine Learning Approaches to Estimating Software Development Effort. IEEE Transactions on Software Engineering 21(2), 126–137 (1995)
Article Google Scholar
Walkerden, F., Jeffery, R.: An empirical study of analogy-based software effort estimation. Empirical Software Engineering 42, 135–158 (1999)
Article Google Scholar
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Proceedings of the poster papers of the European Conference on Machine Learning, University of Economics, Faculty of Informatics and Statistics, Prague
Google Scholar
Witten, I., Frank, E.: Data Mining Practical: Machine Learning Tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Reading, Reading, RG6 6AY, UK
D. Rodríguez
The University of Alcalá, 28805, Alcalá de Henares (Madrid), Spain
J. J. Cuadrado & M. A. Sicilia
The University of Seville, 41012, Avda Reina Mercedes s/n., Sevilla, Spain
R. Ruiz

Authors

D. Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
J. J. Cuadrado
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Sicilia
View author publications
You can also search for this author in PubMed Google Scholar
R. Ruiz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Computing and Emerging Technologies Centre, The School of Systems Engineering, University of Reading, RG6 6AY, Reading, United Kingdom
Vassil N. Alexandrov
Department of Mathematics and Computer Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Geert Dick van Albada
Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Peter M. A. Sloot
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, D., Cuadrado, J.J., Sicilia, M.A., Ruiz, R. (2006). Segmentation of Software Engineering Datasets Using the M5 Algorithm. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758549_106

Download citation

DOI: https://doi.org/10.1007/11758549_106
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34385-1
Online ISBN: 978-3-540-34386-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics