[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3387904.3389263acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

Published: 12 September 2020 Publication History

Abstract

Developers rely on bug reports to fix bugs. The bug reports are usually stored and managed in bug tracking systems. Due to the different expression habits, different reporters may use different expressions to describe the same bug in the bug tracking system. As a result, the bug tracking system often contains many duplicate bug reports. Automatically detecting these duplicate bug reports would save a large amount of effort for bug analysis. Prior studies have found that deep-learning technique is effective for duplicate bug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bug report detection approach based on Dual-Channel Convolutional Neural Networks (DC-CNN). We present a novel bug report pair representation, i.e., dual-channel matrix through concatenating two single-channel matrices representing bug reports. Such bug report pairs are fed to a CNN model to capture the correlated semantic relationships between bug reports. Then, our approach uses the association features to classify whether a pair of bug reports are duplicate or not. We evaluate our approach on three large datasets from three open-source projects, including Open Office, Eclipse, Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches which also use deep-learning techniques. The results indicate that our dual-channel matrix representation is effective for duplicate bug report detection.

References

[1]
Karan Aggarwal, Finbarr Timbers, Tanner Rutgers, Abram Hindle, Eleni Stroulia, and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29, 3 (2017), e1821.
[2]
Anahita Alipour, Abram Hindle, and Eleni Stroulia. 2013. A contextual approach towards more accurate duplicate bug report detection. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 183--192.
[3]
John Anvik, Lyndon Hiew, and Gail C Murphy. 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange. ACM, 35--39.
[4]
Prasad V Bagal, Sameer Arun Joshi, Hanlin Daniel Chien, Ricardo Rey Diez, David Cavazos Woo, Emily Ronshien Su, and Sha Chang. 2019. Duplicate bug report detection using machine learning algorithms and automated feedback incorporation. US Patent App. 16/383,405.
[5]
Andrzej Białecki, Robert Muir, Grant Ingersoll, and Lucid Imagination. 2012. Apache lucene 4. In SIGIR 2012 workshop on open source information retrieval. 17.
[6]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.
[7]
Satya Prateek Bommaraju, Anjaneyulu Pasala, and Shivani Rao. 2018. System and method for detection of duplicate bug reports. US Patent 9,990,268.
[8]
Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Manish Shrivastava. 2018. DWEN: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 193--194.
[9]
Amar Budhiraja, Kartik Dutta, Manish Shrivastava, and Raghu Reddy. 2018. Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 167--170.
[10]
Yguarata Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucredio, and Silvio Romero de Lemos Meira. 2010. An initial study on the bug report duplication problem. In 2010 14th European Conference on Software Maintenance and Reengineering. IEEE, 264--267.
[11]
Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIP-Merge: recommending experts for integrating changes across branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 523--534.
[12]
Jayati Deshmukh, Sanjay Podder, Shubhashis Sengupta, Neville Dubash, et al. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115--124.
[13]
Yuanrui Fan, Xia Xin, Lo David, and Hassan Ahmed E. [n. d.]. Chaff from the Wheat: Characterizing and Determining Valid Bug Reports. IEEE Transactions on Software Engineering ([n. d.]), 1--1.
[14]
Ying Fu, Meng Yan, Xiaohong Zhang, Ling Xu, Dan Yang, and Jeffrey D Kymer. 2015. Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation. Information and Software Technology 57 (2015), 369--377.
[15]
Lyndon Hiew. 2006. Assisted detection of duplicate bug reports. Ph.D. Dissertation. University of British Columbia.
[16]
Pieter Hooimeijer and Westley Weimer. 2007. Modeling bug report quality. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 34--43.
[17]
Nicholas Jalbert and Westley Weimer. 2008. Automated duplicate detection for bug tracking systems. In 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, 52--61.
[18]
Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th working conference on mining software repositories. ACM, 392--395.
[19]
Hoa T Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification?. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.
[20]
Dean Lee, Vincent Siu, Rick Cruz, and Charles Yetman. 2016. Convolutional neural net and bearing fault analysis. In Proceedings of the International Conference on Data Mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer ..., 194.
[21]
Johannes Lerch and Mira Mezini. 2013. Finding duplicates of your yet unwritten bug report. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 69--78.
[22]
Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136--140.
[23]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016).
[24]
Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, and Gongyi Wu. 2004. Improving text classification using local latent semantic indexing. In Fourth IEEE International Conference on Data Mining (ICDM'04). IEEE, 162--169.
[25]
Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225--331.
[26]
Marc Moreno Lopez and Jugal Kalita. 2017. Deep Learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017).
[27]
Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79.
[28]
Xin Rong. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014).
[29]
Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 499--510.
[30]
Nicolas Serrano and Ismael Ciordia. 2005. Bugzilla, ITracker, and other bug trackers. IEEE software 22, 2 (2005), 11--13.
[31]
Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 253--262.
[32]
Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 45--54.
[33]
Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association.
[34]
Ashish Sureka and Pankaj Jalote. 2010. Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference. IEEE, 366--374.
[35]
D Swapna and K Thammi Reddy. 2016. Duplicate Bug Report Detection of User Interface Bugs using Decision Tree Induction and Inverted Index Structure. (2016).
[36]
Yuan Tian, Chengnian Sun, and David Lo. 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 385--390.
[37]
Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering. ACM, 461--470.
[38]
Meng Yan, Ying Fu, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project. Journal of Systems and Software 113 (2016), 296--308.
[39]
Meng Yan, Xiaohong Zhang, Dan Yang, Ling Xu, and Jeffrey D Kymer. 2016. A component recommender for bug reports using discriminative probability latent semantic analysis. Information and Software Technology 73 (2016), 37--51.
[40]
Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2008. Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21, 8 (2008), 879--886.
[41]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems. 649--657.
[42]
Jian Zhou and Hongyu Zhang. 2012. Learning to rank duplicate bug reports. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 852--861.
[43]
Jie Zou, Ling Xu, Mengning Yang, Meng Yan, Dan Yang, and Xiaohong Zhang. 2016. Duplication Detection for Software Bug Reports based on Topic Model. In 2016 9th International Conference on Service Science (ICSS). IEEE, 60--65.
[44]
Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, and Sachio Hirokawa. 2016. Automated duplicate bug report detection using multi-factor analysis. IEICE TRANSACTIONS on Information and Systems 99, 7 (2016), 1762--1775.

Cited By

View all
  • (2024)Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and EngagementACM Transactions on Software Engineering and Methodology10.1145/369162933:8(1-45)Online publication date: 2-Sep-2024
  • (2024)Mobile Bug Report Reproduction via Global Search on the App UI ModelProceedings of the ACM on Software Engineering10.1145/36608241:FSE(2656-2676)Online publication date: 12-Jul-2024
  • (2024)Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00019(114-125)Online publication date: 12-Mar-2024
  • Show More Cited By

Index Terms

  1. Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
    July 2020
    481 pages
    ISBN:9781450379588
    DOI:10.1145/3387904
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional Neural Networks
    2. Dual-Channel
    3. Duplicate Bug Report Detection
    4. Software Maintenance
    5. Software Quality Assurance

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPC '20
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)94
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Predicting Attrition among Software Professionals: Antecedents and Consequences of Burnout and EngagementACM Transactions on Software Engineering and Methodology10.1145/369162933:8(1-45)Online publication date: 2-Sep-2024
    • (2024)Mobile Bug Report Reproduction via Global Search on the App UI ModelProceedings of the ACM on Software Engineering10.1145/36608241:FSE(2656-2676)Online publication date: 12-Jul-2024
    • (2024)Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00019(114-125)Online publication date: 12-Mar-2024
    • (2024)BugBlitz-AI: An Intelligent QA Assistant2024 IEEE 15th International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS62520.2024.10719045(57-63)Online publication date: 13-Sep-2024
    • (2024)Duplicate Bug Report detection using Named Entity RecognitionKnowledge-Based Systems10.1016/j.knosys.2023.111258284(111258)Online publication date: Jan-2024
    • (2024)PR-DupliChecker: detecting duplicate pull requests in Fork-based workflowsInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02361-415:7(3538-3550)Online publication date: 19-Jun-2024
    • (2024)When debugging encounters artificial intelligence: state of the art and open challengesScience China Information Sciences10.1007/s11432-022-3803-967:4Online publication date: 21-Feb-2024
    • (2024)Investigating Freshmen Students’ Coding Standards Challenges Using NLP TechniquesInformation, Communication and Computing Technology10.1007/978-3-031-72483-1_1(3-15)Online publication date: 16-Oct-2024
    • (2024)Issue Links Retrieval for New Issues in Issue Tracking SystemsNatural Language Processing and Information Systems10.1007/978-3-031-70242-6_13(126-138)Online publication date: 20-Sep-2024
    • (2023)A Survey on Bug Deduplication and Triage Methods from Multiple Points of ViewApplied Sciences10.3390/app1315878813:15(8788)Online publication date: 29-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media