More Web Proxy on the site http://driver.im/

research-article

Development emails content analyzer: intention mining in developer discussions

Authors:

Andrea Di Sorbo,

Sebastiano Panichella,

Corrado A. Visaggio,

Massimiliano Di Penta,

Gerardo Canfora,

Harald C. GallAuthors Info & Claims

ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering

Pages 12 - 23

https://doi.org/10.1109/ASE.2015.12

Published: 09 November 2015 Publication History

Abstract

Written development communication (e.g. mailing lists, issue trackers) constitutes a precious source of information to build recommenders for software engineers, for example aimed at suggesting experts, or at redocumenting existing source code. In this paper we propose a novel, semi-supervised approach named DECA (Development Emails Content Analyzer) that uses Natural Language Parsing to classify the content of development emails according to their purpose (e.g. feature request, opinion asking, problem discovery, solution proposal, information giving etc), identifying email elements that can be used for specific tasks. A study based on data from Qt and Ubuntu, highlights a high precision (90%) and recall (70%) of DECA in classifying email content, outperforming traditional machine learning strategies. Moreover, we successfully used DECA for re-documenting source code of Eclipse and Lucene, improving the recall, while keeping high precision, of a previous approach based on ad-hoc heuristics.

References

[1]

J. Aranda, and G. Venolia, The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of the 31st International Conference on Software Engineering (ICSE), 2009, pp. 298--308.

Digital Library

[2]

G. Antoniol, K. Ayari, M. Di Penta, F. Khomh, Y. Guhneuc, Is it a bug or an enhancement?: a text-based approach to classify change requests. CASCON, 2008:23.

Digital Library

[3]

J. Anvik, L. Hiew, and G.C. Murphy, Who should fix this bug?. In Proceedings of the 28th International Conference on Software Engineering (ICSE), 2006, pp. 361--370.

Digital Library

[4]

A. Bacchelli, T. Dal Sasso, M. D'Ambros, and M. Lanza. Content classification of development emails. In Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012, pp. 375--385.

Digital Library

[5]

C. Vassallo, S. Panichella, M. Di Penta, and G. Canfora, CODES: mining source code description from developers discussions, in Proceedings of the 22th IEEE International Conference on Program Comprehension (ICPC), 2014, pp. 106--109.

Digital Library

[6]

R. A. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1999.

Digital Library

[7]

V. R. Basili, L. C. Briand, and W. L. Melo, A validation of object oriented design metrics as quality indicators. In IEEE Trans. Software Eng., vol. 22, no. 10, 1996, pp. 751--761.

Digital Library

[8]

A. Begel and N.Nagappan, Global Software Development. Who Does It?. In Proceedings of the 2008 IEEE International Conference on Global Software Engineering (ICGSE), 2008, pp. 195--199.

Digital Library

[9]

M. Bezerra, A. L. I. Oliveira, and S. R. L. Meira, A constructive rbf neural network for estimating the probability of defects in software modules. In Neural Networks, 2007. IJCNN 2007. International Joint Conference on, 2007, pp. 2869--2874.

[10]

D. M. Blei, A.Y. Ng, and M. I. Jordan, Latent dirichlet allocation. In Journal of Machine Learning Research (JMLR), Vol. 3, 2003, pp. 993--1022.

Digital Library

[11]

G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, Who is going to mentor newcomers in open source projects? In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE), 2012: 44.

Digital Library

[12]

D. Cer, M.C. de Marneffe, D. Jurafsky, and C. D. Manning, Parsing to Stanford dependencies: Trade-offs between speed and accuracy. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), 2010.

[13]

L. Cerulo, M. Ceccarelli, M. Di Penta, and G. Canfora, A Hidden Markov Model to detect coded information islands in free text. In Proceedings of 13th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2013, pp. 157--166.

[14]

E. Ceylan, F. Kutlubay, and A. Bener, Software defect identification using machine learning techniques. In Software Engineering and Advanced Applications, 2006. SEAA 06. 32nd EUROMICRO Conference on, 2006, pp. 240--247.

Digital Library

[15]

W. W. Cohen, V. R.Carvalho, and T. M. Mitchell, Learning to Classify Email into "Speech Acts". In Proceedings of Empirical Methods in Natural Language Processing, 2004, pp 309--316.

[16]

S. Corston-Oliver, E. Ringger, M. Gamon, and R. Campbell, Task-focused summarization of email. In Proceedings of the ACL Workshop Text Summarization Branches Out, 2004, pp 43--50.

[17]

I. Dagan, O. Glickman, and B. Magnini, The PASCAL recognizing textual entailment challenge. In Proceedings of the First International Conference on Machine Learning Challenges: evaluating Predictive Uncertainly Visual Object Classification, and Recognizing Textual Entailment, 2005, pp. 177--190.

Digital Library

[18]

S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Fumas, R. A. Harshman, Indexing by Latent Semantic Analysis. In Journal of the American Society of Information Science, Vol. 41, No. 6, pp. 391--407, 1990.

[19]

M.C. de Marneffe and C.D. Manning, The Stanford typed dependencies representation. In COLING 2008: Proceedings of the Workshop on Cross-framework and Cross-domain Parser Evaluation, 2008, pp 1--8.

Digital Library

[20]

M.C. de Marneffe, and C.D. Manning, Stanford dependencies manual. Technical Report, 2008.

[21]

M.C. de Marneffe, B. MacCartney, and C.D. Manning, Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2006, pp 449--454.

[22]

S. S. Deshpande, G. K. Palshikar, and G. Athiappan, An Unsupervised Approach to Sentence Classification. In International Conference on Management of Data (COMAD), 2010, pp 88--99.

[23]

W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ, 1992.

Digital Library

[24]

K. Fundel, R. Küffner and R. Zimmer, RelEx - Relation extraction using dependency parse trees. In Bioinformatics, v.23, n.3, 2007, pp 365--371.

Digital Library

[25]

B. Glaser, and A. Strauss, The discovery of grounded theory: Strategies of qualitative research. New York, NY:Aldine de Gruyter, 1967.

[26]

A. Guzzi, A. Bacchelli, M. Lanza, M. Pinzger, and A. van Deursen, Communication in open source software development mailing lists. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), 2013, pp 277--286.

Digital Library

[27]

A. Guzzi, A. Begel, J.K. Miller, and K. Nareddy, Facilitating Enterprise Software Developer Communication with CARES. In Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012, pp 1367--1370.

Digital Library

[28]

B. Hachey and C. Grover, Sentence classification experiments for legal text summarization. In Proceedings of 17th Annual Conference on Legal Knowledge and Information Systems (Jurix-2004), 2004, pp. 29--38.

[29]

S. Hen, M. Monperrus, M. Mezini, Semi-automatically extracting FAQs to improve accessibility of software development knowledge. In Proceedings of the 34th ACM/IEEE International Conference on Software Engineering (ICSE), 2012, pp. 793--803.

Digital Library

[30]

K. Herzig, S. Just, and A. Zeller, It's not a bug, it's a feature: how misclassification impacts bug prediction. In Proceedings of the 35th International Conference on Software Engineering (ICSE), 2013, pp. 392--401.

Digital Library

[31]

F. Ibekwe-Sanjuan, S. Fernandez, E. Sanjuan, and E. Charton, Annotation of scientific summaries for information retrieval. In O.A.H. Zaragoza, editor, ECIR08 Workshop on: Exploiting Semantic Annotations for Information Retrieval, 2008, pp. 70--83.

[32]

A. Khoo, Y. Marom, and D. Albrecht, Experiments with sentence classification. In Proceeding of 2006 Australasian Language Technology Workshop (ALTW), 2006, pp. 18--25.

[33]

J. Kim, S. Lee, S.-W. Hwang, and S. Kim, Enriching Documents with Examples: A Corpus Mining Approach. In Journal of ACM Transacrions on Information Systems (TOIS), Vol. 31, Issue n. 1, January 2013, Article n. 1.

Digital Library

[34]

A. J. Ko, B. A. Myers, and D. H. Chau, A linguistic analysis of how people describe software problems. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing, 2006, pp 127--134.

Digital Library

[35]

P. S. Kochhar, Tien-Duy B. Le, and D. Lo, It's not a bug, it's a feature: does misclassification affect bug localization?. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), 2014, pp. 296--299.

Digital Library

[36]

D. Lam, S.L. Rohall, C. Schmandt, and M.K. Stern, Exploiting e-mail structure to improve summarization. In Proceedings of ACM Conference on Computer Supported Cooperative Work (CSCW), Interactive Posters, New Orleans, LA. 2002.

[37]

T. D. LaToza, G. Venolia, and R. DeLine, Maintaining mental models: a study of developer work habits. In Proceedings of the 28th International Conference on Software Engineering (ICSE), 2006, pp. 492--501.

Digital Library

[38]

Y. Liu, T. M. Khoshgoftaar, and N. Seliya, Evolutionary optimization of software quality modeling with multiple repositories. In IEEE Trans. Softw. Eng., vol. 36, no. 6, 2010, pp. 852--864.

Digital Library

[39]

L. McKnight and P. Srinivasan, Categorization of sentence types in medical abstracts. In Proceedings of American Medical Informatics Association Annual Symposium, 2003, pp 440--444.

[40]

W. Maalej and M. P. Robillard, Patterns of Knowledge in API Reference Documentation. In IEEE Trans. Software Eng. 39, no.9, 2013, pp 1264--1282.

Digital Library

[41]

J. Nivre, L. Rimell, R. McDonald, and C. Gómez-Rodríguez, Evaluation of dependency parsers on unbounded dependencies, in Proceedings of COLING, 2010, pp. 813--821.

Digital Library

[42]

R. Pandita, X.Xiao, H.Zhong, T.Xie, S.Oney, and A.Paradkar, Inferring method specifications from natural language API descriptions. In Proceedings of the 34th ACM/IEEE International Conference on Software Engineering (ICSE), 2012, pp 815--825.

Digital Library

[43]

S. Panichella, M. Di Penta, and G. Canfora. How the Evolution of Emerging Collaborations Relates to Code Changes: An Empirical Study. In Proceedings of 22nd International Conference on Program Comprehension (ICPC), 2014, pp 177--188.

Digital Library

[44]

S. Panichella, G. Bavota, M. Di Penta, G. Canfora, and G. Antoniol. How Developers Collaborations Identified from Different Sources Tell us About Code Changes. In Proceedings of 30th International Conference on Software Maintenance and Evolution (ICSME), 2014, pp. 251--260.

Digital Library

[45]

S. Panichella, J. Aponte, M. Di Penta, A. Marcus, and G. Canfora, Mining source code descriptions from developer communications. In Proceedings of the 20th IEEE International Conference on Program Comprehension (ICPC), 2012, pp. 63--72.

[46]

O. Rambow, L. Shrestha, J. Chen, and C. Lauridsen, Summarizing email threads. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) Short paper, 2004, pp 105--108.

Digital Library

[47]

S. Rastkar, G. C. Murphy, and G. Murray, Automatic Summarization of Bug Reports. In IEEE Transactions on Software Engineering, Vol. 40, Issue 4, 2014, pp. 366--380.

Digital Library

[48]

S. Rastkar, G. C. Murphy, and G. Murray, Summarizing software artifacts: a case study of bug reports. In Proceedings of 32nd International Conference on Software Engineering (ICSE), 2010, pp 505--514.

Digital Library

[49]

P. C. Rigby, and A. E. Hassan, What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List. In Proceedings of the 4th International Workshop on Mining Software Repositories, 2007, page 23.

Digital Library

[50]

E. Shihab, N. Bettenburg, B. Adams, and A. E. Hassan, On the central role of mailing lists in open source projects: an exploratory study. In Proceedings of the 2009 International Conference on New Frontiers in Artificial Intelligence (JSAI-isAI), 2009, pp. 91--103.

Digital Library

[51]

S. Teufel and M. Moens, Sentence extraction and rhetorical classification for flexible abstracts. In AAAI Spring Symposium on Intelligent Text summarization, Stanford, 1998, pp. 16--25.

[52]

C. Wang, J. Lu, and G. Zhang, A semantic classification approach for online product reviews. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, 2005, pp. 276--279.

Digital Library

[53]

I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.

Digital Library

[54]

E. Wong, J. Yang, and L. Tan, AutoComment: Mining question and answer sites for automatic comment generation. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013, pp. 562--567.

Digital Library

[55]

X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie, Automated Extraction of Security Policies from Natural-Language Software Documents. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE), 2012, pp 12:1--12:11.

Digital Library

[56]

Y. Yamamoto and T. Takagi, Experiments with sentence classification: A sentence classification system for multi biomedical literature summarization. In Proceedings of 21st International Conference on Data Engineering Workshops, 2005, pp. 1163--1168.

Digital Library

[57]

H. Zhong, L. Zhang, T.Xie, and H.Mei, Inferring resource specification from natural language API documentation. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2009, pp. 307--318.

Digital Library

[58]

Y. Zhou, Y. Tong, R. Gu, and H. Gall, Combining Text Mining and Data Mining for Bug Report Classification. In Proceedings of 30th International Conference on Software Maintenance and Evolution (ICSME), 2014, pp. 311--320.

Digital Library

[59]

T. Zimmermann and N. Nagappan, Predicting defects with program dependencies. In Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on, 2009, pp. 435--438.

Digital Library

Cited By

Zhao JYang ZZhang LLian XYang DTan XFilkov VRay BZhou M(2024)DRMiner: Extracting Latent Design Rationale from Jira Issue LogsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695019(468-480)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695019
Abedini YHeydarnoori A(2024)Can GitHub Issues Help in App Review Classifications?ACM Transactions on Software Engineering and Methodology10.1145/367817033:8(1-42)Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1145/3678170
Shang XZhang SZhang YGuo SLi YChen RLi HLi XJiang H(2024)Analyzing and Detecting Information Types of Developer Live Chat ThreadsACM Transactions on Software Engineering and Methodology10.1145/364367733:5(1-32)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643677
Show More Cited By

Development emails content analyzer: intention mining in developer discussions
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Content classification of development emails
ICSE '12: Proceedings of the 34th International Conference on Software Engineering

Emails related to the development of a software system contain information about design choices and issues encountered during the development process. Exploiting the knowledge embedded in emails with automatic tools is challenging, due to the ...
DECA: development emails content analyzer
ICSE '16: Proceedings of the 38th International Conference on Software Engineering Companion

Written development discussions occurring over different communication means (e.g. issue trackers, development mailing lists, or IRC chats) represent a precious source of information for developers, as well as for researchers interested to build ...
How Experts Detect Phishing Scam Emails
CSCW

Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering

November 2015

935 pages

ISBN:9781509000241

General Chair:
Myra Cohen
University of Nebraska-Lincoln
,
Program Chairs:
Lars Grunske
University of Stuttgart, Germany
,
Michael Whalen
University of Minnesota

Sponsors

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 09 November 2015

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '15

Sponsor:

ASE '15: ACM/IEEE International Conference on Automated Software Engineering

November 9 - 15, 2015

Nebraska, Lincoln

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
62
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao JYang ZZhang LLian XYang DTan XFilkov VRay BZhou M(2024)DRMiner: Extracting Latent Design Rationale from Jira Issue LogsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695019(468-480)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695019
Abedini YHeydarnoori A(2024)Can GitHub Issues Help in App Review Classifications?ACM Transactions on Software Engineering and Methodology10.1145/367817033:8(1-42)Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1145/3678170
Shang XZhang SZhang YGuo SLi YChen RLi HLi XJiang H(2024)Analyzing and Detecting Information Types of Developer Live Chat ThreadsACM Transactions on Software Engineering and Methodology10.1145/364367733:5(1-32)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643677
Jiang HShi LChe MZhang YWang Q(2024)Bringing Open Source Communication and Development Together: A Cross-Platform Study on Gitter and GitHubIEEE Transactions on Software Engineering10.1109/TSE.2024.341029250:11(2807-2826)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3410292
Panichella SDi Sorbo A(2023)Summary of the 2nd Natural Language-based Software Engineering Workshop (NLBSE 2023)ACM SIGSOFT Software Engineering Notes10.1145/3617946.361795748:4(60-63)Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1145/3617946.3617957
Krüger JLi YZhu CChechik MBerger TRubin JChandra SBlincoe KTonella P(2023)A Vision on Intentions in Software EngineeringProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613087(2117-2121)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3613087
Di Sorbo APanichella S(2023)Summary of the 1st Natural Language-based Software Engineering Workshop (NLBSE 2022)ACM SIGSOFT Software Engineering Notes10.1145/3573074.357310148:1(101-104)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1145/3573074.3573101
Di Sorbo AZampetti FVisaggio ADi Penta MPanichella S(2023)Automated Identification and Qualitative Characterization of Safety Concerns Reported in UAV Software PlatformsACM Transactions on Software Engineering and Methodology10.1145/356482132:3(1-37)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3564821
Sworna ZIslam CBabar M(2023)APIRO: A Framework for Automated Security Tools API RecommendationACM Transactions on Software Engineering and Methodology10.1145/351276832:1(1-42)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3512768
Shi LMu FZhang YYang YChen JChen XJiang HJiang ZWang QDwyer MDamian DZeller A(2022)BugListenerProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510108(299-311)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510108
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten