[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2372251.2372267acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Handling categorical variables in effort estimation

Published: 19 September 2012 Publication History

Abstract

Background: Accurate effort estimation is the basis of the software development project management. The linear regression model is one of the widely-used methods for the purpose. A dataset used to build a model often includes categorical variables denoting such as programming languages. Categorical variables are usually handled with two methods: the stratification and dummy variables. Those methods have a positive effect on accuracy but have shortcomings. The other handing method, the interaction and the hierarchical linear model (HLM), might be able to compensate for them. However, the two methods have not been examined in the research area. Aim: giving useful suggestions for handling categorical variables with the stratification, transforming dummy variables, the interaction, or HLM, when building an estimation model. Method: We built estimation models with the four handling methods on ISBSG, NASA, and Desharnais datasets, and compared accuracy of the methods with each other. Results: The most effective method was different for datasets, and the difference was statistically significant on both mean balanced relative error (MBRE) and mean magnitude of relative error (MMRE). The interaction and HLM were effective in a certain case. Conclusions: The stratification and transforming dummy variables should be tried at least, for obtaining an accurate model. In addition, we suggest that the application of the interaction and HLM should be considered when building the estimation model.

References

[1]
Aiken, L., West, S. 1991. Multiple Regression: Testing and Interpreting Interactions. SAGE Publications, Thousand Oaks, CA.
[2]
Boetticher, G., Menzies, T., and Ostrand, T. 2007. PROMISE Repository of empirical software engineering data http:// promisedata.org/?cat=11, West Virginia University, Department of Computer Science.
[3]
Bryk, A., Raudenbush, S. 1992. Hierarchical Linear Models for Social and Behavioral Research: Applications and Data Analysis Methods. SAGE Publications, Thousand Oaks, CA.
[4]
Burgess, C., and Lefley, M. 2001. Can genetic programming improve software effort estimation? A comparative evaluation. Journal of Information and Software Technology 43, 14, 863--873.
[5]
Cataldo, M., and Herbsleb, J. 2011. Factors leading to integration failures in global feature-oriented development: an empirical analysis. In Proc. of the International Conference on Software Engineering (ICSE 2011), 161--170.
[6]
Conte, S., Dunsmore, H., and Shen, V. 1986. Software Engineering, Metrics and Models. Benjamin/Cummings.
[7]
Desharnais, J. 1989. Analyse Statistique de la Productivitie des Projets Informatique a Partie de la Technique des Point des Function. Master Thesis. University of Montreal.
[8]
International Software Benchmarking Standards Group (ISBSG). 2004. ISBSG Estimating: Benchmarking and research suite. ISBSG.
[9]
Lokan, C., and Mendes, E. 2006. Cross-company and single-company effort models using the ISBSG Database: a further replicated study. In Proc. of the international symposium on Empirical software engineering (ISESE 2006), 75--84.
[10]
Menzies, T., Chen, Z., Hihn, J., and Lum., K. 2006. Selecting Best Practices for Effort Estimation. IEEE Trans. Softw. Eng. 32, 11 (Nov. 2006), 883--895.
[11]
Menzies, T., Port, D., Chen, Z., Hihn, J., and Stukes, S.: Validation methods for calibrating software effort models. In Proc. of the international conference on Software engineering (ICSE 2005), 587--595.
[12]
Miyazaki, Y., Terakado, M., Ozaki, K., and Nozaki, H. 1994. Robust Regression for Developing Software Estimation Models. J. Syst. Softw. 27, 1 (October 1994), 3--16.
[13]
Moses, J., and Farrow, M. 2003. A Procedure for Assessing the Influence of Problem Domain on Effort Estimation Consistency. Software Quality Control 11, 4 (November 2003), 283--300.
[14]
Tan, H. B., Zhao, Y., and Zhang, H. 2009. Conceptual data model-based software size estimation for information systems. ACM Trans. Softw. Eng. Methodol. 19, 2, Article 4 (October 2009), 37 pages.

Cited By

View all
  • (2023)A Dummy-Variable Model for Humidity-Influenced DC Film Capacitors Lifetime EstimationIEEE Journal of Emerging and Selected Topics in Power Electronics10.1109/JESTPE.2022.320806511:1(1056-1070)Online publication date: Feb-2023
  • (2022)Identifying Terrestrial Landscape Character Types in ChinaLand10.3390/land1107101411:7(1014)Online publication date: 4-Jul-2022
  • (2021)Distribution of Chinese traditional villages and influencing factors for regionalizationCiência Rural10.1590/0103-8478cr2020012451:7Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '12: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
September 2012
338 pages
ISBN:9781450310567
DOI:10.1145/2372251
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dummy variable
  2. hierarchical linear model
  3. interaction
  4. mixed effects
  5. model-based effort estimation
  6. stratification

Qualifiers

  • Research-article

Conference

ESEM '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Dummy-Variable Model for Humidity-Influenced DC Film Capacitors Lifetime EstimationIEEE Journal of Emerging and Selected Topics in Power Electronics10.1109/JESTPE.2022.320806511:1(1056-1070)Online publication date: Feb-2023
  • (2022)Identifying Terrestrial Landscape Character Types in ChinaLand10.3390/land1107101411:7(1014)Online publication date: 4-Jul-2022
  • (2021)Distribution of Chinese traditional villages and influencing factors for regionalizationCiência Rural10.1590/0103-8478cr2020012451:7Online publication date: 2021
  • (2020)Selecting best predictors from large software repositories for highly accurate software effort estimationJournal of Software: Evolution and Process10.1002/smr.227132:10Online publication date: 4-Oct-2020
  • (2013)How to treat timing information for software effort estimation?Proceedings of the 2013 International Conference on Software and System Process10.1145/2486046.2486051(10-19)Online publication date: 18-May-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media