More Web Proxy on the site http://driver.im/

Article

Improving defect prediction using temporal features and non linear models

Authors:

Abraham Bernstein,

Jayalath Ekanayake,

Martin PinzgerAuthors Info & Claims

IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting

Pages 11 - 18

https://doi.org/10.1145/1294948.1294953

Published: 03 September 2007 Publication History

Abstract

Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases.

Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression.

Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).

References

[1]

M. Askari and R. Holt. Information theoretic evaluation of change prediction models for large-scale software. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 126--132, New York, NY, USA, 2006. ACM Press.

Digital Library

[2]

T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000.

Digital Library

[3]

A. E. Hassan and R. C. Holt. The top ten list: Dynamic fault prediction. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05), pages 263--272, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[4]

H. Joshi, C. Zhang, S. Ramaswamy, and C. Bayrak. Local and global recency weighting approach to bug prediction. In MSR 2007: International Workshop on Mining Software Repositories, 2007.

Digital Library

[5]

T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. Detection of software modules with high debug code churn in a very large legacy system. In Proceedings of the Seventh International Symposium on Software Reliability Engineering, pages 364--371, White Plains, NY, 1996. IEEECS.

Digital Library

[6]

C. Kiefer, A. Bernstein, and J. Tappolet. Analyzing software with isparql. In Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007). Springer, June 2007. to appear.

[7]

P. Knab, M. Pinzger, and A. Bernstein. Predicting defect densities in source code files with decision tree learners. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 119--125, New York, NY, USA, 2006. ACM Press.

Digital Library

[8]

R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997.

Digital Library

[9]

A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society.

Digital Library

[10]

N. Nagappan and T. Ball. Static analysis tools as early indicators of pre-release defect density. In ICSE '05: Proceedings of the 27th international conference on Software engineering, p580--586, 2005.

Digital Library

[11]

T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005.

Digital Library

[12]

F. J. Provost and T. Fawcett. Robust classification for imprecise environments. volume 42, pages 203--231, 2001.

Digital Library

[13]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

Digital Library

[14]

R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, Singapore, 1992.

[15]

A. Schröter. Predicting defects and changes with import relations. In Proceedings of MSR 2007: International Workshop on Mining Software Repositories, 2007.

Digital Library

[16]

J. Sliwerski, T. Zimmermann, and A. Zeller. Hatari: Raising risk awareness (research demonstration). In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 107--110. ACM, September 2005.

Digital Library

[17]

I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, second edition, 2005.

Digital Library

[18]

T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse, May 2007.

Digital Library

Cited By

Balasubramaniam BAhmed IBagheri HBradley J(2024)Carving Out Control Code: Automated Identification of Control Software in Autopilot SystemsACM Transactions on Cyber-Physical Systems10.1145/36782598:4(1-20)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3678259
Feyzi FDaneshdoost A(2023)Studying the effectiveness of deep active learning in software defect predictionInternational Journal of Computers and Applications10.1080/1206212X.2023.225211745:7-8(534-552)Online publication date: 5-Sep-2023
https://doi.org/10.1080/1206212X.2023.2252117
Pandey STripathi A(2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 20-Nov-2023
https://doi.org/10.1007/s42979-023-02364-1
Show More Cited By

Index Terms

Improving defect prediction using temporal features and non linear models

Recommendations

Time variance and defect prediction in software projects

It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project's goals. In this paper we first ...
Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
Compressed C4.5 Models for Software Defect Prediction
QSIC '12: Proceedings of the 2012 12th International Conference on Quality Software

Defects in every software must be handled properly, and the number of defects directly reflects the quality of a software. In recent years, researchers have applied data mining and machine learning methods to predicting software defects. However, in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting

September 2007

122 pages

ISBN:9781595937223

DOI:10.1145/1294948

Program Chairs:
Massimiliano Di Penta
RCOST --- Università degli Studi del Sannio, Italy
,
Michele Lanza
University of Lugano, Switzerland

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
CEPIS: The Council of European Professional Informatics Societies

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ESEC/FSE07

Sponsor:

ACM
SIGSOFT
CEPIS

ESEC/FSE07: Joint 11th European Software Engineering Conference 2007

September 3 - 4, 2007

Dubrovnik, Croatia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
703
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Balasubramaniam BAhmed IBagheri HBradley J(2024)Carving Out Control Code: Automated Identification of Control Software in Autopilot SystemsACM Transactions on Cyber-Physical Systems10.1145/36782598:4(1-20)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3678259
Feyzi FDaneshdoost A(2023)Studying the effectiveness of deep active learning in software defect predictionInternational Journal of Computers and Applications10.1080/1206212X.2023.225211745:7-8(534-552)Online publication date: 5-Sep-2023
https://doi.org/10.1080/1206212X.2023.2252117
Pandey STripathi A(2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 20-Nov-2023
https://doi.org/10.1007/s42979-023-02364-1
Wu YZhang YXu KWang TWang H(2022)Understanding and Predicting Docker Build Duration: An Empirical Study of Containerized Workflow of OSS ProjectsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556940(1-13)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3556940
Gesi JLi JAhmed ILanubile F(2021)An Empirical Examination of the Impact of Bias on Just-in-time Defect PredictionProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3475791(1-12)Online publication date: 11-Oct-2021
https://dl.acm.org/doi/10.1145/3475716.3475791
Kaur IKaur A(2021)Comparative analysis of software fault prediction using various categories of classifiersInternational Journal of System Assurance Engineering and Management10.1007/s13198-021-01110-112:3(520-535)Online publication date: 10-May-2021
https://doi.org/10.1007/s13198-021-01110-1
Felix ELee S(2020)Predicting the number of defects in a new software versionPLOS ONE10.1371/journal.pone.022913115:3(e0229131)Online publication date: 18-Mar-2020
https://doi.org/10.1371/journal.pone.0229131
Brindescu CAhmed ILeano RSarma ARothermel GBae D(2020)Planning for untanglingProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380344(801-811)Online publication date: 27-Jun-2020
https://dl.acm.org/doi/10.1145/3377811.3380344
Mannan UAhmed IJensen CSarma ADevanbu PCohen MZimmermann T(2020)On the relationship between design discussions and design quality: a case study of Apache projectsProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409707(543-555)Online publication date: 8-Nov-2020
https://dl.acm.org/doi/10.1145/3368089.3409707
Wen MWu RCheung S(2020)How Well Do Change Sequences Predict Defects? Sequence Learning from Software ChangesIEEE Transactions on Software Engineering10.1109/TSE.2018.287625646:11(1155-1175)Online publication date: 1-Nov-2020
https://doi.org/10.1109/TSE.2018.2876256
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten