More Web Proxy on the site http://driver.im/

research-article

Fault Localization with Code Coverage Representation Learning

Authors:

Tien N. NguyenAuthors Info & Claims

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

Pages 661 - 673

https://doi.org/10.1109/ICSE43902.2021.00067

Published: 05 November 2021 Publication History

Abstract

In this paper, we propose DEEPRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DEEPRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data).

For the code coverage information, DEEPRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DEEPRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.

References

[1]

(2019) The Defects4J data set. [Online]. Available: https://github.com/rjust/defects4j

[2]

(2019) Gzoltar. [Online]. Available: http://www.gzoltar.com/

[3]

(2019) The ManyBugs data set. [Online]. Available: https://repairbenchmarks.cs.umass.edu/

[4]

(2019) Pit. [Online]. Available: https://pitest.org/

[5]

(2021) The github repository for this study. [Online]. Available: https://github.com/deeprl4fl2021icse/deeprl4fl-2021-icse

[6]

R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "An evaluation of similarity coefficients for software fault localization," in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06). IEEE, 2006, pp. 39--46.

Digital Library

[7]

R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "On the accuracy of spectrum-based fault localization," in Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION (TAICPART-MUTATION 2007). IEEE, 2007, pp. 89--98.

Digital Library

[8]

M. Allamanis, H. Peng, and C. A. Sutton, "A convolutional attention network for extreme summarization of source code," CoRR, vol. abs/1602.03001, 2016. [Online]. Available: http://arxiv.org/abs/1602.03001

[9]

U. Alon, S. Brody, O. Levy, and E. Yahav, "code2seq: Generating sequences from structured representations of code," arXiv preprint arXiv:1808.01400, 2018.

[10]

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," CoRR, vol. abs/1803.09473, 2018. [Online]. Available: http://arxiv.org/abs/1803.09473

Digital Library

[11]

M. Amodio, S. Chaudhuri, and T. W. Reps, "Neural attribute machines for program generation," CoRR, vol. abs/1705.09231, 2017. [Online]. Available: http://arxiv.org/abs/1705.09231

[12]

T.-D. B Le, D. Lo, C. Le Goues, and L. Grunske, "A learning-to-rank based fault localization approach using likely invariants," in Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA'16). ACM, 2016, pp. 177--188.

Digital Library

[13]

D. W. Barowy, E. D. Berger, and B. Zorn, "Excelint: Automatically finding spreadsheet formula errors," Proc. ACM Program. Lang., vol. 2, no. OOPSLA, Oct. 2018. [Online]. Available: https://doi.org/10.1145/3276518

Digital Library

[14]

S. Bhatia and R. Singh, "Automated correction for syntax errors in programming assignments using recurrent neural networks," CoRR, vol. abs/1603.06129, 2016. [Online]. Available: http://arxiv.org/abs/1603.06129

[15]

P. Bielik, V. Raychev, and M. Vechev, "Phog: Probabilistic model for code," in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20-22 Jun 2016, pp. 2933--2942. [Online]. Available: http://proceedings.mlr.press/v48/bielik16.html

Digital Library

[16]

L. C. Briand, Y. Labiche, and X. Liu, "Using machine learning to support debugging with tarantula," in The 18th IEEE International Symposium on Software Reliability (ISSRE'07). IEEE, 2007, pp. 137--146.

Digital Library

[17]

T. A. Budd, "Mutation analysis of program test data." 1981.

Digital Library

[18]

A. Grover and J. Leskovec, "node2vec: Scalable feature learning for networks," CoRR, vol. abs/1607.00653, 2016. [Online]. Available: http://arxiv.org/abs/1607.00653

Digital Library

[19]

Hadamard, "Hadamard product," https://en.wikipedia.org/wiki/Hadamard_product_(matrices), last Accessed July 11, 2019.

[20]

J. A. Jones, M. J. Harrold, and J. Stasko, "Visualization of test information to assist fault localization," in Proceedings of the 24th International Conference on Software Engineering (ICSE'02), 2002, pp. 467--477.

Digital Library

[21]

J. A. Jones and M. J. Harrold, "Empirical evaluation of the tarantula automatic fault-localization technique," in Proceedings of the 20th IEEE/ACM international Conference on Automated Software Engineering (ASE'05). ACM, 2005, pp. 273--282.

Digital Library

[22]

F. Keller, L. Grunske, S. Heiden, A. Filieri, A. van Hoorn, and D. Lo, "A critical evaluation of spectrum-based fault localization techniques on a large-scale software system," in IEEE International Conference on Software Quality, Reliability and Security (QRS'17). IEEE, 2017, pp. 114--125.

[23]

Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014.

[24]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097--1105.

Digital Library

[25]

C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer, "The ManyBugs and IntroClass benchmarks for automated repair of C programs," IEEE Transactions on Software Engineering (TSE), vol. 41, no. 12, pp. 1236--1256, December 2015.

Digital Library

[26]

L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, "CCLearner: A deep learning-based clone detection approach," in IEEE International Conference on Software Maintenance and Evolution (ICSME'17), Sep. 2017, pp. 249--260.

[27]

X. Li, W. Li, Y. Zhang, and L. Zhang, "DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization," in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 2019, pp. 169--180.

Digital Library

[28]

X. Li and L. Zhang, "Transforming programs and tests in tandem for fault localization," Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017. [Online]. Available: https://doi.org/10.1145/3133916

Digital Library

[29]

Y. Li, S. Wang, T. N. Nguyen, and S. V. Nguyen, "Improving bug detection via context-based code representation learning and attention-based neural networks," Proc. ACM Program. Lang. 3, OOPSLA, Article 1 (October 2019), 2019.

Digital Library

[30]

B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, "Scalable statistical bug isolation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '05. New York, NY, USA: Association for Computing Machinery, 2005, p. 15--26. [Online]. Available: https://doi.org/10.1145/1065010.1065014

Digital Library

[31]

L. Lucia, D. Lo, L. Jiang, F. Thung, and A. Budi, "Extended comprehensive study of association measures for fault localization," Journal of software: Evolution and Process, vol. 26, no. 2, pp. 172--219, 2014.

Digital Library

[32]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS'13), 2013, pp. 3111--3119.

Digital Library

[33]

S. Moon, Y. Kim, M. Kim, and S. Yoo, "Ask the mutants: Mutating faulty programs for fault localization," in IEEE International Conference on Software Testing, Verification and Validation, 2014, pp. 153--162.

Digital Library

[34]

L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang, "TBCNN: A tree-based convolutional neural network for programming language processing," CoRR, vol. abs/1409.5718, 2014. [Online]. Available: http://arxiv.org/abs/1409.5718

[35]

V. Musco, M. Monperrus, and P. Preux, "A large-scale study of call graph-based impact prediction using mutation testing," Software Quality Journal, vol. 25, no. 3, pp. 921--950, 2017.

Digital Library

[36]

L. Naish, H. J. Lee, and K. Ramamohanarao, "A model for spectra-based software diagnosis," ACM Transactions on software engineering and methodology (TOSEM), vol. 20, no. 3, p. 11, 2011.

Digital Library

[37]

M. Papadakis and Y. Le Traon, "Using mutants to locate "unknown" faults," in IEEE International Conference on Software Testing, Verification and Validation. IEEE, 2012, pp. 691--700.

Digital Library

[38]

M. Papadakis and Y. Le Traon, "Metallaxis-FL: mutation-based fault localization," Software Testing, Verification and Reliability, vol. 25, no. 5-7, pp. 605--628, 2015.

Digital Library

[39]

J. Patra and M. Pradel, "Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data," TUD-CS-2016-14664, TU Darmstadt, Tech. Rep., 2016.

[40]

R. Singh, B. Livshits, and B. Zorn, "Melford: Using neural networks to find spreadsheet errors," Microsoft Research, Microsoft Tech Report Number MSR-TR-2017-5, Tech. Rep., 2017.

[41]

R. Smith and S. Horwitz, "Detecting and measuring similarity in code clones," 2009.

[42]

R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, "Parsing natural scenes and natural language with recursive neural networks," in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 129--136.

Digital Library

[43]

J. Sohn and S. Yoo, "Fluccs: Using code and change metrics to improve fault localization," in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'17). ACM, 2017, pp. 273--283.

Digital Library

[44]

D. Tang, B. Qin, and T. Liu, "Document modeling with gated recurrent neural network for sentiment classification," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Sep. 2015, pp. 1422--1432. [Online]. Available: https://www.aclweb.org/anthology/D15-1167

[45]

WALA, "Wala documentation." http://wala.sourceforge.net/wiki/index.php/Main_Page, last Accessed July 11, 2019.

[46]

M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ser. ASE 2016. New York, NY, USA: ACM, 2016, pp. 87--98. [Online]. Available: http://doi.acm.org/10.1145/2970276.2970326

Digital Library

[47]

W. E. Wong, V. Debroy, R. Golden, X. Xu, and B. Thuraisingham, "Effective software fault localization using an RBF neural network," IEEE Transactions on Reliability, vol. 61, no. 1, pp. 149--169, 2011.

[48]

W. E. Wong, V. Debroy, Y. Li, and R. Gao, "Software fault localization using DStar (D*)," in 6th IEEE International Conference on Software Security and Reliability. IEEE, 2012, pp. 21--30.

Digital Library

[49]

W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, "A survey on software fault localization," IEEE Trans. Softw. Eng., vol. 42, no. 8, pp. 707--740, Aug. 2016. [Online]. Available: https://doi.org/10.1109/TSE.2016.2521368

Digital Library

[50]

W. E. Wong and Y. Qi, "BP neural network-based effective fault localization," International Journal of Software Engineering and Knowledge Engineering, vol. 19, no. 04, pp. 573--597, 2009.

[51]

W. E. Wong, Y. Qi, L. Zhao, and K.-Y. Cai, "Effective fault localization using code coverage," in 31st Annual International Computer Software and Applications Conference (COMPSAC 2007), vol. 1. IEEE, 2007, pp. 449--456.

Digital Library

[52]

J. Xuan and M. Monperrus, "Learning to combine multiple ranking metrics for fault localization," in IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE, 2014, pp. 191--200.

Digital Library

[53]

J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, "A novel neural source code representation based on Abstract Syntax Tree," in Proceedings of the 41st International Conference on Software Engineering (ICSE'19). IEEE Press, 2019, pp. 783--794.

Digital Library

[54]

L. Zhang, M. Kim, and S. Khurshid, "Localizing failure-inducing program edits based on spectrum information," in Proceedings of the 27th IEEE International Conference on Software Maintenance (ICSM '11). IEEE, 2011, pp. 23--32.

Digital Library

[55]

L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei, "Test generation via dynamic symbolic execution for mutation testing," in IEEE International Conference on Software Maintenance (ICSM'10). IEEE, 2010, pp. 1--10.

Digital Library

[56]

L. Zhang, L. Zhang, and S. Khurshid, "Injecting mechanical faults to localize developer faults for evolving software," in Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, ser. OOPSLA '13. New York, NY, USA: Association for Computing Machinery, 2013, p. 765--784. [Online]. Available: https://doi.org/10.1145/2509136.2509551

Digital Library

[57]

Z. Zhang, Y. Lei, X. Mao, and P. Li, "CNN-FL: An effective approach for localizing faults using convolutional neural networks," in IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER '19), 2019, pp. 445--455.

[58]

Z. Zhang, Y. Lei, Q. Tan, X. Mao, P. Zeng, and X. Chang, "Deep learning-based fault localization with contextual information," Ieice Transactions on Information and Systems, vol. 100, no. 12, pp. 3027--3031, 2017.

[59]

G. Zhao and J. Huang, "Deepsim: Deep learning code functional similarity," in Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2018. New York, NY, USA: ACM, 2018, pp. 141--151. [Online]. Available: http://doi.acm.org/10.1145/3236024.3236068

Digital Library

[60]

W. Zheng, D. Hu, and J. Wang, "Fault localization analysis based on deep neural network," Mathematical Problems in Engineering, vol. 2016, 2016.

Cited By

Xie HLei YLi MYan MZhang SFilkov VRay BZhou M(2024)Combining Coverage and Expert Features with Semantic Representation for Coincidental Correctness DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695542(1770-1782)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695542
Zhao YChen YSun ZLiang QWang GHao DFilkov VRay BZhou M(2024)Spotting Code Mutation for Predictive Mutation TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695491(1133-1145)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695491
Zhang XSong YXie XXin QXing CFilkov VRay BZhou M(2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695479
Show More Cited By

Index Terms

Fault Localization with Code Coverage Representation Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Boosting coverage-based fault localization via graph-based representation learning
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Coverage-based fault localization has been extensively studied in the literature due to its effectiveness and lightweightness for real-world systems. However, existing techniques often utilize coverage in an oversimplified way by abstracting detailed ...
A family of code coverage-based heuristics for effective fault localization

Locating faults in a program can be very time-consuming and arduous, and therefore, there is an increased demand for automated techniques that can assist in the fault localization process. In this paper a code coverage-based method with a family of ...
Fault localization for build code errors in makefiles
ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering

Building is an important process in software development. In large software projects, build code has a high level of complexity, churn rate, and defect proneness. While several automated approaches exist to help developers in localizing faults in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

May 2021

1768 pages

ISBN:9781450390859

Sponsors

Publisher

IEEE Press

Publication History

Published: 05 November 2021

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSE '21

Sponsor:

SIGSOFT

ICSE '21: 43rd International Conference on Software Engineering

May 22 - 30, 2021

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
224
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)7

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie HLei YLi MYan MZhang SFilkov VRay BZhou M(2024)Combining Coverage and Expert Features with Semantic Representation for Coincidental Correctness DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695542(1770-1782)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695542
Zhao YChen YSun ZLiang QWang GHao DFilkov VRay BZhou M(2024)Spotting Code Mutation for Predictive Mutation TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695491(1133-1145)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695491
Zhang XSong YXie XXin QXing CFilkov VRay BZhou M(2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695479
Wan YBi ZHe YZhang JZhang HSui YXu GJin HYu P(2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/366459756:12(1-41)Online publication date: 18-May-2024
https://dl.acm.org/doi/10.1145/3664597
Yang HNong YZhang TLuo XCai H(2024)Learning to Detect and Localize Multilingual BugsProceedings of the ACM on Software Engineering10.1145/36608041:FSE(2190-2213)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660804
Rafi MKim DChen AChen TWang S(2024)Towards Better Graph Neural Network-Based Fault Localization through Enhanced Code RepresentationProceedings of the ACM on Software Engineering10.1145/36607931:FSE(1937-1959)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660793
Kang SAn GYoo S(2024)A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault LocalizationProceedings of the ACM on Software Engineering10.1145/36607711:FSE(1424-1446)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660771
Wu JZhang ZYang DXu JHe JMao X(2024)Knowledge-Augmented Mutation-Based Bug Localization for Hardware Design CodeACM Transactions on Architecture and Code Optimization10.1145/366052621:3(1-26)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3660526
Wang XYu HMeng XCao HZhang HSun HLiu XHu C(2024)MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program RepairACM Transactions on Software Engineering and Methodology10.1145/365444133:6(1-31)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3654441
Xie MHu MKong ZZhang CFeng YWang HXue YZhang HLiu YLiu YChristakis MPradel M(2024)DeFort: Automatic Detection and Analysis of Price Manipulation Attacks in DeFi ApplicationsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652137(402-414)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652137
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents