[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2372251.2372268acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Discretization methods for NBC in effort estimation: an empirical comparison based on ISBSG projects

Published: 19 September 2012 Publication History

Abstract

Background: Bayesian networks have been applied in many fields, including effort estimation in software engineering. Even though there are Bayesian inference algorithms than can handle continuous variables, performance tends to be better when these variables are discretized that when they are assumed to follow a specific distribution. On the other hand, the choice of the discretization method and the number of discretized intervals may lead to significantly different estimating results. However, discretization issues are seldom mentioned in software engineering effort estimation models.
Aim: This paper seeks to show that discretization issues are important in terms of prediction accuracy while building a Naive Bayes Classifier (NBC) for estimating software effort.
Method: For this purpose, a NBC model has been developed for software effort estimation based on ISBSG projects applying different discretization schemes (equal width intervals, equal frequency intervals, and k-means clustering) and using different number of intervals.
Results: Regarding the NBC model built, the estimation accuracy of equal frequency discretization is only improved by k-means clustering with respect to Pred(0.25), although it reflects better the original distribution.
Conclusions: Further experimentation should determine the potential of clustering methods already highlighted in other fields.

References

[1]
Bibi, S. and Stamelos, I. 2008. Estimating the Development Cost for Intelligent Systems. Intelligent Interactive Systems in Knowledge-Based Environments Studies in Computational Intelligence. Virvou, M. 25--45.
[2]
Bibi, S., Stamelos, I. et al. 2003. Bayesian Belief Networks as a Software Productivity Estimation Tool. 1st Balkan Conference in Informatics, Thessaloniki. (2003).
[3]
Bibi, S., Stamelos, I. et al. 2010. BBN based approach for improving the software development process of an SME--a case study. Journal of Software Maintenance and Evolution: Research and Practice. 22, (Mar. 2010).
[4]
Dougherty, J., Kohavi, R. et al. 1995. Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference on Machine Learning (1995), 194--202.
[5]
Fenton, N., Marsh, W. et al. 2004. Making Resource Decisions for Software Projects. Proceedings of the 26th International Conference on Software Engineering (Washington, DC, USA, 2004), 397--406.
[6]
Fernández-Diego, M., Martínez-Gómez, M. et al. 2010. Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset. Proceedings of the 6th International Conference on Predictive Models in Software Engineering (New York, NY, USA, 2010), 13:1--13:9.
[7]
Fernández-Diego, M., Elmouaden, S. et al. 2012. Software Effort Estimation using NBC and SWR: A comparison based on ISBSG projects. Proceedings of the Joint Conference of the 22nd International Workshop on Software Measurement (IWSM) and the 7th International Conference on Software Process and Product Measurement (Mensura) (Assisi, Italy, 2012), 1--5.
[8]
Hamdan, K., Bibi, S. et al. 2009. A bayesian belief network cost estimation model that incorporates cultural and project leadership factors. IEEE Symposium on Industrial Electronics & Applications (2009), 985--989.
[9]
Hearty, P., Fenton, N. et al. 2009. Predicting Project Velocity in XP Using a Learning Dynamic Bayesian Network Model. IEEE Transactions on Software Engineering. 35, 1 (2009), 124--137.
[10]
Ho, E.K.Y. and Knobbe, A.J. 2005. Numbers in multi-relational data mining. Lecture Notes in Computer Science. 3721, (2005), 544--551.
[11]
Hsu, C., Huang, H. et al. 2000. Why Discretization Works for Naive Bayesian Classifiers. Proceedings of the Seventeenth International Conference on Machine Learning (2000), 309--406.
[12]
Khan, J., Shaikh, Z.A. et al. 2011. Development of Intelligent Effort Estimation Model Based on Fuzzy Logic Using Bayesian Networks. Communications in Computer and Information Science. 257, (2011), 74--84.
[13]
Kozlov, A. and Koller, D. 1997. Nonuniform Dynamic Discretization in Hybrid Networks. Thirteenth Conference on Uncertainty in Artificial Intelligence (1997), 314--325.
[14]
Mendes, E. 2007. A Comparison of Techniques for Web Effort Estimation. First International Symposium on Empirical Software Engineering and Measurement (ESEM) (2007), 334--343.
[15]
Mendes, E. 2007. Predicting Web Development Effort Using a Bayesian Network. 11th International Conference on Evaluation and Assessment in Software Engineering (EASE) (2007), 83--93.
[16]
Mendes, E. 2008. The Use of Bayesian Networks for Web Effort Estimation: Further Investigation. Eighth International Conference on Web Engineering (ICWE 2008) (Washington, DC, USA, 2008), 203--216.
[17]
Mendes, E. and Mosley, N. 2008. Bayesian Network Models for Web Effort Prediction: A Comparative Study. IEEE Transactions on Software Engineering. 34, (Nov. 2008), 723--737.
[18]
Nauman, A.B. and Aziz, R. 2011. Development of Simple Effort Estimation Model based on Fuzzy Logic using Bayesian Networks. International Journal of Computer Applications. 3, Special Issue on "Artificial Intelligence Techniques - Novel Approaches & Practical Applications" (2011), 31--34.
[19]
Neil, M., Tailor, M. et al. 2007. Inference in hybrid Bayesian networks using dynamic discretization. Statistics and Computing. 17, (2007).
[20]
Noothong, T. and Sutivong, D. 2006. Software Project Management Using Decision Networks. Sixth International Conference on Intelligent Systems Design and Applications (Jinan, 2006), 1124--1129.
[21]
Pendharkar, P., Subramanian, G. et al. 2005. A probabilistic model for predicting software development effort. IEEE Transactions on Software Engineering. 31, 7 (2005), 615--624.
[22]
RadliDski, A., Fenton, N. et al. 2007. Modelling prior productivity and defect rates in a causal model for software project risk assessment. Polish Journal of Environmental Studies. 16, 4A (2007), 256--260.
[23]
RadliDski, A. 2010. A Survey of Bayesian Net Models for Software Development Effort Prediction. International Journal of Software Engineering and Computing. 2, 2 (2010), 95--109.
[24]
RadliDski, A., Fenton, N. et al. 2008. Estimating Productivity and Defect Rates Based on Environmental Factors. Information Systems Architecture and Technology: Models of the Organisation's Risk Management (WrocBaw, Poland, 2008), 103--113.
[25]
RadliDski, A., Fenton, N. et al. 2007. Improved decision-making for software managers using Bayesian networks. 11th International Conference on Software Engineering and Applications (Anaheim, CA, USA, 2007), 13--19.
[26]
Seo, Y., Yoon, K. et al. 2008. An empirical analysis of software effort estimation with outlier elimination. Proceedings of the 6th International Conference on Predictive Models in Software Engineering (New York, NY, USA, 2008), 25--32.
[27]
Stamelos, I., Angelis, L. et al. 2003. On the use of Bayesian belief networks for the prediction of software productivity. Information and Software Technology. 45, 1 (2003), 51--60.
[28]
Stamelos, I. and Bibi, S. 2004. Software Process Modeling with Bayesian Belief Networks. 10th IEEE International Conference on Software METRICS (September 2004).
[29]
Stewart, B. 2002. Predicting project delivery rates using the Naive-Bayes classifier. Journal of Software Maintenance: Research and Practice. 14, (May. 2002), 161--179.
[30]
Wang, H., Peng, F. et al. 2006. Software Project Level Estimation Model Framework based on Bayesian Belief Networks. Proceedings of the Sixth International Conference on Quality Software (Washington, DC, USA, 2006), 209--218.
[31]
Witten, I.H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

Cited By

View all
  • (2022)Toward Improving the Efficiency of Software Development Effort Estimation via Clustering AnalysisIEEE Access10.1109/ACCESS.2022.318539310(83249-83264)Online publication date: 2022
  • (2021)A hybrid model for prediction of software effort based on team sizeIET Software10.1049/sfw2.1204815:6(365-375)Online publication date: 4-Dec-2021
  • (2016)Software effort estimation based on the optimal Bayesian belief networkApplied Soft Computing10.1016/j.asoc.2016.08.00449:C(968-980)Online publication date: 1-Dec-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '12: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
September 2012
338 pages
ISBN:9781450310567
DOI:10.1145/2372251
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ISBSG
  2. bayesian networks
  3. discretization methods
  4. effort estimation
  5. naive bayes classifier
  6. software projects

Qualifiers

  • Research-article

Conference

ESEM '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Toward Improving the Efficiency of Software Development Effort Estimation via Clustering AnalysisIEEE Access10.1109/ACCESS.2022.318539310(83249-83264)Online publication date: 2022
  • (2021)A hybrid model for prediction of software effort based on team sizeIET Software10.1049/sfw2.1204815:6(365-375)Online publication date: 4-Dec-2021
  • (2016)Software effort estimation based on the optimal Bayesian belief networkApplied Soft Computing10.1016/j.asoc.2016.08.00449:C(968-980)Online publication date: 1-Dec-2016
  • (2013)An Extended Assessment of Data-Driven Bayesian Networks in Software Effort PredictionProceedings of the 2013 27th Brazilian Symposium on Software Engineering10.1109/SBES.2013.17(157-166)Online publication date: 1-Oct-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media