[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1294948.1294953acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Improving defect prediction using temporal features and non linear models

Published: 03 September 2007 Publication History

Abstract

Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases.
Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression.
Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).

References

[1]
M. Askari and R. Holt. Information theoretic evaluation of change prediction models for large-scale software. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 126--132, New York, NY, USA, 2006. ACM Press.
[2]
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000.
[3]
A. E. Hassan and R. C. Holt. The top ten list: Dynamic fault prediction. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05), pages 263--272, Washington, DC, USA, 2005. IEEE Computer Society.
[4]
H. Joshi, C. Zhang, S. Ramaswamy, and C. Bayrak. Local and global recency weighting approach to bug prediction. In MSR 2007: International Workshop on Mining Software Repositories, 2007.
[5]
T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. Detection of software modules with high debug code churn in a very large legacy system. In Proceedings of the Seventh International Symposium on Software Reliability Engineering, pages 364--371, White Plains, NY, 1996. IEEECS.
[6]
C. Kiefer, A. Bernstein, and J. Tappolet. Analyzing software with isparql. In Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007). Springer, June 2007. to appear.
[7]
P. Knab, M. Pinzger, and A. Bernstein. Predicting defect densities in source code files with decision tree learners. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 119--125, New York, NY, USA, 2006. ACM Press.
[8]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997.
[9]
A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society.
[10]
N. Nagappan and T. Ball. Static analysis tools as early indicators of pre-release defect density. In ICSE '05: Proceedings of the 27th international conference on Software engineering, p580--586, 2005.
[11]
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005.
[12]
F. J. Provost and T. Fawcett. Robust classification for imprecise environments. volume 42, pages 203--231, 2001.
[13]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[14]
R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, Singapore, 1992.
[15]
A. Schröter. Predicting defects and changes with import relations. In Proceedings of MSR 2007: International Workshop on Mining Software Repositories, 2007.
[16]
J. Sliwerski, T. Zimmermann, and A. Zeller. Hatari: Raising risk awareness (research demonstration). In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 107--110. ACM, September 2005.
[17]
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, second edition, 2005.
[18]
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse, May 2007.

Cited By

View all
  • (2024)Carving out Control Code: Automated Identification of Control Software in Autopilot SystemsACM Transactions on Cyber-Physical Systems10.1145/3678259Online publication date: 17-Jul-2024
  • (2023)Studying the effectiveness of deep active learning in software defect predictionInternational Journal of Computers and Applications10.1080/1206212X.2023.225211745:7-8(534-552)Online publication date: 5-Sep-2023
  • (2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 20-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
September 2007
122 pages
ISBN:9781595937223
DOI:10.1145/1294948
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decision tree learner
  2. defect prediction
  3. mining software repository

Qualifiers

  • Article

Conference

ESEC/FSE07
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Carving out Control Code: Automated Identification of Control Software in Autopilot SystemsACM Transactions on Cyber-Physical Systems10.1145/3678259Online publication date: 17-Jul-2024
  • (2023)Studying the effectiveness of deep active learning in software defect predictionInternational Journal of Computers and Applications10.1080/1206212X.2023.225211745:7-8(534-552)Online publication date: 5-Sep-2023
  • (2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 20-Nov-2023
  • (2022)Understanding and Predicting Docker Build Duration: An Empirical Study of Containerized Workflow of OSS ProjectsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556940(1-13)Online publication date: 10-Oct-2022
  • (2021)An Empirical Examination of the Impact of Bias on Just-in-time Defect PredictionProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3475791(1-12)Online publication date: 11-Oct-2021
  • (2021)Comparative analysis of software fault prediction using various categories of classifiersInternational Journal of System Assurance Engineering and Management10.1007/s13198-021-01110-112:3(520-535)Online publication date: 10-May-2021
  • (2020)Predicting the number of defects in a new software versionPLOS ONE10.1371/journal.pone.022913115:3(e0229131)Online publication date: 18-Mar-2020
  • (2020)Planning for untanglingProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380344(801-811)Online publication date: 27-Jun-2020
  • (2020)On the relationship between design discussions and design quality: a case study of Apache projectsProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409707(543-555)Online publication date: 8-Nov-2020
  • (2020)How Well Do Change Sequences Predict Defects? Sequence Learning from Software ChangesIEEE Transactions on Software Engineering10.1109/TSE.2018.287625646:11(1155-1175)Online publication date: 1-Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media