[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

BugBuilder: An Automated Approach to Building Bug Repository

Published: 01 April 2023 Publication History

Abstract

Bug-related research, e.g., fault localization, program repair, and software testing, relies heavily on high-quality and large-scale software bug repositories. The importance of such repositories is twofold. On one side, real-world bugs and their associated patches may inspire novel approaches for finding, locating, and repairing software bugs. On the other side, the real-world bugs and their patches are indispensable for rigorous and meaningful evaluation of approaches to software testing, fault localization, and program repair. To this end, a number of software bug repositories, e.g., iBUGS and Defects4J, have been constructed recently by mining version control systems and bug tracking systems. However, fully automated construction of bug repositories by simply taking bug-fixing commits from version control systems often results in inaccurate patches that contain many bug-irrelevant changes. Although we may request experts or developers to manually exclude the bug-irrelevant changes (as the authors of Defects4J did), such extensive human intervention makes it difficult to build large-scale bug repositories. To this end, in this paper, we propose an automatic approach, called <italic>BugBuilder</italic>, to construct bug repositories from version control systems. Different from existing approaches, it automatically extracts complete and concise bug-fixing patches and excludes bug-irrelevant changes. It first detects and excludes software refactorings involved in bug-fixing commits. <italic>BugBuilder</italic> then enumerates all subsets of the remaining part, and discards invalid subsets by compilation and software testing. If exactly a single subset survives the validation, this subset is taken as the complete and concise bug-fixing patch for the associated bug. In case multiple subsets survive, BugBuilder employs a sequence of heuristics to select the most likely one. Evaluation results on 809 real-world bug-fixing commits in Defects4J suggest that <italic>BugBuilder</italic> successfully extracted complete and concise bug-fixing patches from forty-three percent of the bug-fixing commits, and its precision (99&#x0025;) was even higher than human experts. We also built a bug repository, called GrowingBugs, with the proposed approach. The resulting repository serves as evidence of the usefulness of the proposed approach, as well as a publicly available benchmark for bug-related research.

References

[1]
W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A survey on software fault localization,” IEEE Trans. Softw. Eng., vol. 42, no. 8, pp. 707–740, Aug. 2016.
[2]
J. Sohn and S. Yoo, “FLUCCS: Using code and change metrics to improve fault localization,” in Proc. 26th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2017, pp. 273–283.
[3]
X. Li, W. Li, Y. Zhang, and L. Zhang, “DeepFL: Integrating multiple fault diagnosis dimensions for deep fault localization,” in Proc. 28th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2019, pp. 169–180.
[4]
S. Pearsonet al., “Evaluating and improving fault localization,” in Proc. IEEE/ACM 39th Int. Conf. Softw. Eng., 2017, pp. 609–620.
[5]
J. Lee, D. Kim, T. F. Bissyandé, W. Jung, and Y. Le Traon, “Bench4BL: Reproducibility study on the performance of IR-based bug localization,” in Proc. 27th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2018, pp. 61–72.
[6]
X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “HYDRA: Massively compositional model for cross-project defect prediction,” IEEE Trans. Softw. Eng., vol. 42, no. 10, pp. 977–998, Oct. 2016.
[7]
A. Majd, M. Vahidi-Asl, A. Khalilian, P. Poorsarvi-Tehrani, and H. Haghighi, “SLDeep: Statement-level software defect prediction using deep-learning model on static code features,” Expert Syst. Appl., vol. 147, 2020, Art. no.
[8]
Z. He, F. Peters, T. Menzies, and Y. Yang, “Learning from open-source projects: An empirical study on defect prediction,” in Proc. ACM/IEEE Int. Symp. Empir. Softw. Eng. Meas., 2013, pp. 45–54.
[9]
J. Nam, S. Wang, Y. Xi, and L. Tan, “A bug finder refined by a large set of open-source projects,” Inf. Softw. Technol., vol. 112, pp. 164–175, 2019.
[10]
D. Jeffrey, M. Feng, N. Gupta, and R. Gupta, “BugFix: A learning-based tool to assist developers in fixing bugs,” in Proc. IEEE 17th Int. Conf. Prog. Comprehension, 2009, pp. 70–79.
[11]
M. Martinez, T. Durieux, R. Sommerard, J. Xuan, and M. Monperrus, “Automatic repair of real bugs in Java: A large-scale experiment on the defects4j dataset,” Empir. Softw. Eng., vol. 22, no. 4, pp. 1936–1964, 2017.
[12]
S. H. Tan and A. Roychoudhury, “Relifix: Automated repair of software regressions,” in Proc. IEEE/ACM 37th IEEE Int. Conf. Softw. Eng., 2015, pp. 471–482.
[13]
B. Daniel, V. Jagannath, D. Dig, and D. Marinov, “ReAssert: Suggesting repairs for broken unit tests,” in Proc. IEEE/ACM Int. Conf. Automated Softw. Eng., 2009, pp. 433–444.
[14]
Y. Xiong, X. Liu, M. Zeng, L. Zhang, and G. Huang, “Identifying patch correctness in test-based program repair,” in Proc. 40th Int. Conf. Softw. Eng., 2018, pp. 789–799.
[15]
Y. Xionget al., “Precise condition synthesis for program repair,” in Proc. 39th Int. Conf. Softw. Eng., 2017, pp. 416–426.
[16]
R. Just, D. Jalali, and M. D. Ernst, “Defects4J: A database of existing faults to enable controlled testing studies for Java programs,” in Proc. Int. Symp. Softw. Testing Anal., 2014, pp. 437–440.
[17]
V. Dallmeier and T. Zimmermann, “Extraction of bug localization benchmarks from history,” in Proc. 22nd IEEE/ACM Int. Conf. Automated Softw. Eng., 2007, pp. 433–436.
[18]
H. Do, S. Elbaum, and G. Rothermel, “Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact,” Empir. Softw. Eng., vol. 10, no. 4, pp. 405–435, 2005.
[19]
J. Xuanet al., “Nopol: Automatic repair of conditional statement bugs in Java programs,” IEEE Trans. Softw. Eng., vol. 43, no. 01, pp. 34–55, Jan. 2017.
[20]
D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation learned from human-written patches,” in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 802–811.
[21]
Z. Chen, S. J. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, “SEQUENCER: Sequence-to-sequence learning for end-to-end program repair,” IEEE Trans. Softw. Eng., vol. 47, no. 9, pp. 1943–1959, Sep. 2021.
[22]
E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang, “Hoppity: Learning graph transformations to detect and fix bugs in programs,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–17.
[23]
Y. Li, S. Wang, and T. N. Nguyen, “Dlfix: Context-based code transformation learning for automated program repair,” in Proc. 42nd Int. Conf. Softw. Eng., 2020, pp. 602–614.
[24]
H. Zhong, X. Wang, and H. Mei, “Inferring bug signatures to detect real bugs,” IEEE Trans. Softw. Eng., vol. 48, no. 2, pp. 571–584, Feb. 2022.
[25]
S. Lu, Z. Li, F. Qin, L. Tan, P. Zhou, and Y. Zhou, “BugBench: Benchmarks for evaluating bug detection tools,” in Proc. Workshop Eval. Softw. Defect Detection Tools, 2005, pp. 16–20.
[26]
C. Le Goueset al., “The ManyBugs and IntroClass benchmarks for automated repair of C programs,” IEEE Trans. Softw. Eng., vol. 41, no. 12, pp. 1236–1256, Dec. 2015.
[27]
S. H. Tanet al., “Codeflaws: A programming competition benchmark for evaluating automated program repair tools,” in Proc. IEEE/ACM 39th Int. Conf. Softw. Eng. Companion, 2017, pp. 180–182.
[28]
D. Lin, J. Koppel, A. Chen, and A. Solar-Lezama, “QuixBugs: A multi-lingual program repair benchmark set based on the quixey challenge,” in Proc. Companion ACM SIGPLAN Int. Conf. Syst. Program. Lang. Appl. Softw. Humanity, 2017, pp. 55–56.
[29]
S. H. Tan, Z. Dong, X. Gao, and A. Roychoudhury, “Repairing crashes in Android apps,” in Proc. IEEE/ACM 40th Int. Conf. Softw. Eng., 2018, pp. 187–198.
[30]
M. Böhme, E. O. Soremekun, S. Chattopadhyay, E. Ugherughe, and A. Zeller, “Where is the bug and how is it fixed? an experiment with practitioners,” in Proc. 11th Joint Meeting Found. Softw. Eng., 2017, pp. 117–128.
[31]
S. Herboldet al., “A fine-grained data set and analysis of tangling in bug fixing commits,” 2020,.
[32]
N. Tsantalis, M. Mansouri, L. M. Eshkevari, D. Mazinanian, and D. Dig, “Accurate and efficient refactoring detection in commit history,” in Proc. 40th Int. Conf. Softw. Eng., 2018, pp. 483–494.
[33]
Y. Jiang, H. Liu, N. Niu, L. Zhang, and Y. Hu, “Extracting concise bug-fixing patches from human-written patches in version control systems,” in Proc. IEEE/ACM 43rd Int. Conf. Softw. Eng., 2021, pp. 686–698.
[34]
H. Mei and L. Zhang, “Can Big Data bring a breakthrough for software automation?,” Sci. China Inf. Sci. USA, vol. 61, no. 5, pp. 056101:1–056101:3, 2018.
[35]
J. Spacco, J. Strecker, D. Hovemeyer, and W. Pugh, “Software repository mining with marmoset: An automated programming project snapshot and testing system,” in Proc. Int. Workshop Mining Softw. Repositories, 2005, pp. 1–5.
[36]
P. Gyimesiet al., “BugsJS: A benchmark of JavaScript bugs,” in Proc. 12th IEEE Conf. Softw. Testing Validation Verification, 2019, pp. 90–101.
[37]
R. K. Saha, Y. Lyu, W. Lam, H. Yoshida, and M. R. Prasad, “Bugs.jar: A large-scale, diverse dataset of real-world Java bugs,” in Proc. 15th Int. Conf. Mining Softw. Repositories, 2018, pp. 10–13.
[38]
F. Madeiral, S. Urli, M. Maia, and M. Monperrus, “BEARS: An extensible Java bug benchmark for automatic program repair studies,” in Proc. IEEE 26th Int. Conf. Softw. Anal. Evol. Reengineering, 2019, pp. 468–478.
[39]
D. A. Tomassiet al., “BugsWarm: Mining and continuously growing a dataset of reproducible failures and fixes,” in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng., 2019, pp. 339–349.
[40]
D. Kawrykow and M. P. Robillard, “Non-essential changes in version histories,” in Proc. 33rd Int. Conf. Softw. Eng., 2011, pp. 351–360.
[41]
B. Fluri, M. Wursch, M. PInzger, and H. Gall, “Change distilling:Tree differencing for fine-grained source code change extraction,” IEEE Trans. Softw. Eng., vol. 33, no. 11, pp. 725–743, Nov. 2007.
[42]
F. Thung, D. Lo, and L. Jiang, “Automatic recovery of root causes from bug-fixing changes,” in Proc. 20th Work. Conf. Reverse Eng., 2013, pp. 92–101.
[45]
A. Almogahed and M. Omar, “Refactoring techniques for improving software quality: Practitioners’ perspectives,” J. Inf. Commun. Technol., vol. 20, no. 4, pp. 511–539, 2021.
[46]
O. Hamdi, A. Ouni, E. A. AlOmar, M. O. Cinnéide, and M. W. Mkaouer, “An empirical study on the impact of refactoring on quality metrics in Android applications,” in Proc. IEEE/ACM 8th Int. Conf. Mobile Softw. Eng. Syst., 2021, pp. 28–39.
[47]
P. Weissgerber and S. Diehl, “Identifying refactorings from source-code changes,” in Proc. 21st IEEE/ACM Int. Conf. Automated Softw. Eng., 2006, pp. 231–240.
[48]
D. Silva and M. T. Valente, “RefDiff: Detecting refactorings in version histories,” in Proc. IEEE/ACM 14th Int. Conf. Mining Softw. Repositories, 2017, pp. 269–279.
[49]
M. Kim, D. Cai, and S. Kim, “An empirical investigation into the role of API-level refactorings during software evolution,” in Proc. 33rd Int. Conf. Softw. Eng., 2011, pp. 151–160.
[50]
F. Palomba, A. Zaidman, R. Oliveto, and A. De Lucia, “An exploratory study on the relationship between changes and refactoring,” in Proc. IEEE/ACM 25th Int. Conf. Prog. Comprehension, 2017, pp. 176–185.
[51]
D. Dig, K. Manzoor, R. E. Johnson, and T. N. Nguyen, “Effective software merging in the presence of object-oriented refactorings,” IEEE Trans. Softw. Eng., vol. 34, no. 3, pp. 321–335, May/Jun. 2008.
[52]
S. Kaur, L. K. Awasthi, and A. Sangal, “A brief review on multi-objective software refactoring and a new method for its recommendation,” Archit. Comput. Methods Eng., vol. 28, no. 4, pp. 3087–3111, 2021.
[53]
A. Peruma, S. Simmons, E. A. AlOmar, C. D. Newman, M. W. Mkaouer, and A. Ouni, “How do I refactor this? An empirical study on refactoring trends and topics in stack overflow,” Empir. Softw. Eng., vol. 27, no. 1, pp. 1–43, 2022.
[54]
H. He, Y. Xu, Y. Ma, Y. Xu, G. Liang, and M. Zhou, “A multi-metric ranking approach for library migration recommendations,” in Proc. IEEE Int. Conf. Softw. Anal. Evol. Reengineering, 2021, pp. 72–83.
[55]
N. Tsantalis, A. Ketkar, and D. Dig, “Refactoringminer 2.0,” IEEE Trans. Softw. Eng., vol. 48, no. 3, pp. 930–950, Mar. 2022.
[57]
G. Murphy, M. Kersten, and L. Findlater, “How are Java software developers using the eclipse IDE?,” IEEE Softw., vol. 23, no. 4, pp. 76–83, Jul./Aug. 2006.
[58]
2021. [Online]. Available: https://github.com/cli/cli
[73]
2021. [Online]. Available: https://code.google.com/
[75]
2021. [Online]. Available: https://github.com/
[76]
2021. [Online]. Available: https://sourceforge.net/
[77]
2021. [Online]. Available: https://www.bugzilla.org/
[78]
T. Durieux, F. Madeiral, M. Martinez, and R. Abreu, “Empirical review of Java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts,” in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2019, pp. 302–313.

Cited By

View all
  • (2024)JLeaks: A Featured Resource Leak Repository Collected From Hundreds of Open-Source Java ProjectsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639162(1-13)Online publication date: 20-May-2024
  • (2023)An Automated Approach to Extracting Local VariablesProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616261(313-325)Online publication date: 30-Nov-2023
  • (2022)Do bugs lead to unnaturalness of source code?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549149(1085-1096)Online publication date: 7-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 49, Issue 4
April 2023
1635 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)JLeaks: A Featured Resource Leak Repository Collected From Hundreds of Open-Source Java ProjectsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639162(1-13)Online publication date: 20-May-2024
  • (2023)An Automated Approach to Extracting Local VariablesProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616261(313-325)Online publication date: 30-Nov-2023
  • (2022)Do bugs lead to unnaturalness of source code?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549149(1085-1096)Online publication date: 7-Nov-2022

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media