More Web Proxy on the site http://driver.im/

research-article

Unsupervised deep bug report summarization

Authors:

Ge LiAuthors Info & Claims

ICPC '18: Proceedings of the 26th Conference on Program Comprehension

Pages 144 - 155

https://doi.org/10.1145/3196321.3196326

Published: 28 May 2018 Publication History

Abstract

Bug report summarization is an effective way to reduce the considerable time in wading through numerous bug reports. Although some supervised and unsupervised algorithms have been proposed for this task, their performance is still limited, due to the particular characteristics of bug reports, including the evaluation behaviours in bug reports, the diverse sentences in software language and natural language, and the domain-specific predefined fields. In this study, we conduct the first exploration of the deep learning network on bug report summarization. Our approach, called DeepSum, is a novel stepped auto-encoder network with evaluation enhancement and predefined fields enhancement modules, which successfully integrates the bug report characteristics into a deep neural network. DeepSum is unsupervised. It significantly reduces the efforts on labeling huge training sets. Extensive experiments show that DeepSum outperforms the comparative algorithms by up to 13.2% and 9.2% in terms of F-score and Rouge-n metrics respectively over the public datasets, and achieves the state-of-the-art performance. Our work shows promising prospects for deep learning to summarize millions of bug reports.

References

[1]

Tensorflow an open-source software library for Machine Intelligence. 2017. https://www.tensorflow.org/. (2017).

[2]

Xuân Baldauf. 2005. Converting image from grayscale to black&white is painfully slow. https://bugzilla.gnome.org/show_bug.cgi?id=170801. (2005).

[3]

Nicolas Bettenburg, Rahul Premraj, Sunghun Kim, and Thomas Zimmermann. 2008. Extracting structural information from bug reports. In Proceedings of the International Working Conference on Mining Software Repositories (MSR'08). ACM, 27--30.

Digital Library

[4]

Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmful really?. In IEEE International Conference on Software Maintenance (ICSM'08). IEEE, 337--345.

[5]

Ziqiang Cao, Furu Wei, Li Dong, Sujian Li, and Ming Zhou. 2015. Ranking with recursive neural networks and its application to multi-document summarization. In AAAI Conference on Artificial Intelligence (AAAI'12). 2153--2159.

Digital Library

[6]

Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 335--336.

Digital Library

[7]

Debian. 2016. Introduction to the bug control and manipulation mailserver. http://www.debian.org/Bugs/server-control#summary. (2016).

[8]

Jayati Deshmukh, Annervaz K M, Sanjay Podder, Shubhashis Sengupta, and Neville Dubash. 2017. Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques. In IEEE International Conference on Software Maintenance and Evolution (ICSME'17).

[9]

Damian Doyle. 2017. Default English stopwords list. http://www.ranks.nl/stopwords. (2017).

[10]

Laura V Galvis Carreño and Kristina Winbladh. 2013. Analysis of user comments: an approach for software requirements evolution. In Proceedings of the 2013 International Conference on Software Engineering (ICSE'13). IEEE Press, 582--591.

Digital Library

[11]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'16). ACM, 631--642.

Digital Library

[12]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI Conference on Artificial Intelligence (AAAI'17). 1345--1351.

[13]

Geoffrey Hinton and Tijmen Tieleman. 2012. Lecture 6.5 - RMSProp, COURSERA: Neural Networks for Machine Learning. (2012).

[14]

Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics (1979), 65--70.

[15]

He Jiang, Xiaochen Li, Zijiang Yang, and Jifeng Xuan. 2017. What causes my test alarm? automatic cause analysis for test alarms in system and integration testing. In Proceedings of the 39th International Conference on Software Engineering (ICSE'17). 712--723.

Digital Library

[16]

He Jiang, Jingxuan Zhang, Xiaochen Li, Zhilei Ren, and David Lo. 2016. A more accurate model for finding tutorial segments explaining APIs. In IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER'16), Vol. 1. IEEE, 157--167.

[17]

He Jiang, Jingxuan Zhang, Hhongjing Ma, Nazar Najam, and Zhilei Ren. 2017. Mining authorship characteristics in bug repositories. Science China Informaction Science 58 (2017).

[18]

Sunghun Kim, Kai Pan, and EE Whitehead Jr. 2006. Memories of bug fixes. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE'06). ACM, 35--45.

Digital Library

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS'12).

Digital Library

[20]

AnNgoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension (ICPC'17). IEEE Press, 218--229.

Digital Library

[21]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2015. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15). IEEE, 476--481.

Digital Library

[22]

Jian Li, Pinjia He, Jieming Zhu, and Michael R Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In IEEE International Conference on Software Quality, Reliability and Security (QRS'17). IEEE, 318--328.

[23]

Chin-Yew Lin. 2004. Rouge: a package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop, Vol. 8. Barcelona, Spain.

[24]

Bugzilla Installation List. 2017. https://www.bugzilla.org/installation-list/. (2017).

[25]

Chang Liu, Xinyun Chen, Eui Chul Shin, Mingcheng Chen, and Dawn Song. 2016. Latent attention for if-then program synthesis. In Advances in Neural Information Processing Systems (NIPS'16). 4574--4582.

Digital Library

[26]

Yan Liu, Sheng-hua Zhong, and Wenjie Li. 2012. Query-oriented multi-document summarization via unsupervised deep learning. In AAAI Conference on Artificial Intelligence (AAAI'12).

Digital Library

[27]

Rafael Lotufo, Zeeshan Malik, and Krzysztof Czarnecki. 2015. Modelling the Hurried bug report reading process to summarize bug reports. Empirical Software Engineering 20, 2 (2015), 516--548.

Digital Library

[28]

Apache Lucene. 2016. http://lucene.apache.org/. (2016).

[29]

Senthil Mani, Rose Catherine, Vibha Singhal Sinha, and Avinava Dubey. 2012. Ausum: approach for unsupervised bug report summarization. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE'12). ACM, 11.

Digital Library

[30]

Qiaozhu Mei, Jian Guo, and Dragomir Radev. 2010. Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'10). ACM, 1009--1018.

Digital Library

[31]

Hajime Morita, Ryohei Sasano, Hiroya Takamura, and Manabu Okumura. 2013. Subtree extractive summarization via submodular maximization. In Annual Meeting of the Association for Computational Linguistics (ACL'13). Citeseer, 1023--1032.

[32]

Lili Mou, Ge Li, lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 1287--1293.

Digital Library

[33]

Mozilla. 2013. Bug writing guidelines. https://developer.mozilla. org/en-US/docs/Mozilla/QA/Bug_writing_guidelines. (2013).

[34]

Karolina Owczarzak, John M Conroy, Hoa Trang Dang, and Ani Nenkova. 2012. An assessment of the accuracy of automatic evaluation in summarization. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. ACL, 1--9.

Digital Library

[35]

Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In International Conference on Knowledge Science, Engineering and Management. Springer, 547--553.

Digital Library

[36]

Martin F Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.

[37]

Dragomir R Radev, Hongyan Jing, Malgorzata Styś, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919--938.

Digital Library

[38]

Sarah Rastkar, Gail C Murphy, and Gabriel Murray. 2014. Automatic summarization of bug reports. IEEE Transactions on Software Engineering (TSE'14) 40, 4 (2014), 366--380.

Digital Library

[39]

Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Notices, Vol. 49. ACM, 419--428.

Digital Library

[40]

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[41]

Yuan Tian, David Lo, and Chengnian Sun. 2013. Drone: Predicting priority of reported bugs by multi-factor analysis. In IEEE International Conference on Software Maintenance (ICSM'13). IEEE, 200--209.

Digital Library

[42]

Tjekkles. 2011. Java: Open a file (Windows + Mac). https://stackoverflow.com/questions/7024031/. (2011).

[43]

Paolo Toth. 1980. Dynamic programming algorithms for the zero-one knapsack problem. Computing 25, 1 (1980), 29--45.

[44]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE'16). ACM, 297--308.

Digital Library

[45]

Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In Proceedings of the International Working Conference on Mining Software Repositories (MSR'15). IEEE, 334--345.

Digital Library

[46]

Fen Xia, Tie Yan Liu, Jue Wang, Hang Li, and Hang Li. 2008. Listwise approach to learning to rank: theory and algorithm. In International Conference on Machine Learning. 1192--1199.

Digital Library

[47]

Xin Xia, David Lo, Emad Shihab, and Xinyu Wang. 2016. Automated bug report field reassignment and refinement prediction. IEEE Transactions on Reliability 65, 3 (2016), 1094--1113.

[48]

Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE'16). ACM, 51--62.

Digital Library

[49]

Jifeng Xuan, He Jiang, Yan Hu, Zhilei Ren, Weiqin Zou, Zhongxuan Luo, and Xindong Wu. 2015. Towards effective bug triage with software data reduction techniques. IEEE Transactions on Knowledge and Data Engineering (TKDE'15) 27, 1 (2015), 264--280.

[50]

Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In IEEE International Conference on Software Quality, Reliability and Security (QRS'15). IEEE, 17--26.

Digital Library

[51]

Xiaojin Zhu, Andrew B Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007. Improving diversity in ranking using absorbing random walks. In Proceedings of NAACL HLT. 97--104.

[52]

Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schröter, and Cathrin Weiss. 2010. What makes a good bug report? IEEE Transactions on Software Engineering (TSE'10) 36, 5 (2010), 618--643.

Digital Library

Cited By

Chen XHu XHuang YJiang HJi WJiang YJiang YLiu BLiu HLi XLian XMeng GPeng XSun HShi LWang BWang CWang JWang TXuan JXia XYang YYang YZhang LZhou YZhang L(2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s11432-023-4127-5
Meng QZou WCai BZhang J(2024)KeyTitle: towards better bug report title generation by keywords planningSoftware Quality Journal10.1007/s11219-024-09695-z32:4(1655-1682)Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11219-024-09695-z
Assi MHassan SGeorgiou SZou Y(2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
https://dl.acm.org/doi/10.1145/3593802
Show More Cited By

Index Terms

Unsupervised deep bug report summarization
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software

Recommendations

Bug Report Summarization: A systematic Literature Review
ICETC '19: Proceedings of the 11th International Conference on Education Technology and Computers

Natural language Processing techniques have been proved very helpful in optimizing the software development process. It has improved the accuracy and speed of different steps of development process. Summarization of software artifacts is one of ...
BugSum: Deep Context Understanding for Bug Report Summarization
ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

During collaborative software development, bug reports are dynamically maintained and evolved as a part of a software project. For a historical bug report with complicated discussions, an accurate and concise summary can enable stakeholders to reduce ...
How to cherry pick the bug report for better summarization?
Abstract
Bug reports, as a frequently consulted software asset, are maintained and evolved in software communities. A large number of bug reports with complex discussions are accumulated during the software evolution. It has been proven that an accurate ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '18: Proceedings of the 26th Conference on Program Comprehension

May 2018

423 pages

ISBN:9781450357142

DOI:10.1145/3196321

General Chair:
Foutse Khomh
École Polytechnique de Montréal, Canada
,
Program Chairs:
Chanchal K. Roy
University of Saskatchewan, Canada
,
Janet Siegmund
University of Passau, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 28 - 29, 2018

Gothenburg, Sweden

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen XHu XHuang YJiang HJi WJiang YJiang YLiu BLiu HLi XLian XMeng GPeng XSun HShi LWang BWang CWang JWang TXuan JXia XYang YYang YZhang LZhou YZhang L(2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s11432-023-4127-5
Meng QZou WCai BZhang J(2024)KeyTitle: towards better bug report title generation by keywords planningSoftware Quality Journal10.1007/s11219-024-09695-z32:4(1655-1682)Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11219-024-09695-z
Assi MHassan SGeorgiou SZou Y(2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
https://dl.acm.org/doi/10.1145/3593802
Liu JHe SChen ZLi LKang YZhang XHe PZhang HLin QXu ZRajmohan SZhang DLyu MGrundy JPollock LPenta M(2023)Incident-Aware Duplicate Ticket Aggregation for Cloud SystemsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00193(2299-2311)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00193
Kou BChen MZhang TGrundy JPollock LPenta M(2023)Automated Summarization of Stack Overflow PostsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00158(1853-1865)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00158
Fang SZhang TTan YJiang HXia XSun XGrundy JPollock LPenta M(2023)RepresentThemAll: A Universal Learning Representation of Bug ReportsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00060(602-614)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00060
Agarwal TThakur N(2023)Deep Learning in Requirement Engineering: A Statistical Justification2023 International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS)10.1109/ICCAMS60113.2023.10525759(1-8)Online publication date: 27-Oct-2023
https://doi.org/10.1109/ICCAMS60113.2023.10525759
Li HYan MSun WLiu XWu Y(2023)A first look at bug report templates on GitHubJournal of Systems and Software10.1016/j.jss.2023.111709202:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111709
Koh YKang SLee S(2022)Deep Learning-Based Bug Report Summarization Using Sentence Significance FactorsApplied Sciences10.3390/app1212585412:12(5854)Online publication date: 8-Jun-2022
https://doi.org/10.3390/app12125854
Mishra SHarshavardhan KMitra SSaha SBhattacharyya PWagner M(2022)Bug report summarization using multi-view multi-objective optimization frameworkProceedings of the Genetic and Evolutionary Computation Conference10.1145/3512290.3528843(1245-1253)Online publication date: 8-Jul-2022
https://dl.acm.org/doi/10.1145/3512290.3528843
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents