More Web Proxy on the site http://driver.im/

research-article

Towards Effectively Test Report Classification to Assist Crowdsourced Testing

Authors:

Song WangAuthors Info & Claims

ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Article No.: 6, Pages 1 - 10

https://doi.org/10.1145/2961111.2962584

Published: 08 September 2016 Publication History

Abstract

Context: Automatic classification of crowdsourced test reports is important due to their tremendous sizes and large proportion of noises. Most existing approaches towards this problem focus on examining the performance of different machine learning or information retrieval techniques, and most are evaluated on open source dataset. However, our observation reveals that these approaches generate poor and unstable performances on real industrial crowdsourced testing data. We further analyze the deep reason and find that industrial data have significant local bias, which degrades existing approaches.

Goal: We aim at designing an approach to overcome the local bias in industrial data and automatically classifying true fault from the large amounts of crowdsourced reports.

Method: We propose a cluster-based classification approach, which first clusters similar reports together and then builds classifiers based on most similar clusters with ensemble method.

Results: Evaluation is conducted on 15,095 test reports of 35 industrial projects from Chinese largest crowdsourced testing platform and results are promising, with 0.89 precision and 0.97 recall on average. In addition, our approach improves the existing baselines by 17% - 63% in average precision and 15% - 61% in average recall.

Conclusions: Results imply that our approach can effectively discriminate true fault from large amounts of crowdsourced reports, which can reduce the effort required for manual inspection and facilitate project management in crowdsourced testing. To the best of our knowledge, this is the first work to address the test report classification problem in real industrial crowdsourced testing practice.

References

[1]

A systematic review of effect size in software engineering experiments. Information and Software Technology, 49(11-12):1073--1086, 2007.

Digital Library

[2]

A. Berson, S. Smith, and K. Thearling. An overview of data mining techniques. Building Data Mining Application for CRM, 2004.

[3]

M. Borg, D. Pfahl, and P. Runeson. Analyzing networks of issue reports. In CSMR 2013, pages 79--88.

Digital Library

[4]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.

[5]

N. Chen and S. Kim. Puzzle-based automatic testing: Bringing humans into the loop by solving puzzles. In ASE 2012, pages 140--149.

Digital Library

[6]

Y. Feng, Z. Chen, J. A. Jones, C. Fang, and B. Xu. Test report prioritization to assist crowdsourced testing. In ESEC/FSE 2015, pages 225--236.

Digital Library

[7]

H. Finch. Comparison of distance measures in cluster analysis with dichotomous data. Journal of Data Science, 3:85--100, 2005.

[8]

B. Ghotra, S. McIntosh, and A. E. Hassan. Revisiting the impact of classification techniques on the performance of defect prediction models. In ICSE 2015, pages 789--800.

Digital Library

[9]

M. Gómez, R. Rouvoy, B. Adams, and L. Seinturier. Reproducing context-sensitive crashes of mobile apps using crowdsourced monitoring. In Proceedings of the 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).

Digital Library

[10]

V. H. M. Gomide, P. A. Valle, J. O. Ferreira, J. R. G. Barbosa, A. F. da Rocha, and T. M. G. d. A. Barbosa. Affective crowdsourcing applied to usability testing. International Journal of Computer Science and Information Technologies, 5(1):575--579, 2014.

[11]

E. Guzman, M. El-Halaby, and B. Bruegge. Ensemble methods for app review classification: An approach for software evolution. In ASE 2015, pages 771--776.

Digital Library

[12]

S. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31:249--268, 2007.

[13]

W. Maalej and H. Nabil. Bug report, feature request, or simply praise? on automatically classifying app reviews. In RE 2015, pages 116--125.

[14]

K. Mao, Y. Yang, Q. Wang, Y. Jia, and M. Harman. Developer recommendation for crowdsourced software development tasks. In SOSE 2015, pages 347--356.

Digital Library

[15]

T. Menzies, A. Butcher, A. Marcus, D. Cok, F. Shull, B. Turhan, and T. Zimmermann. Local versus global lessons for defect prediction and effort estimation. IEEE Transactions on software engineering, 39(6):822--834, 2013.

Digital Library

[16]

T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok. Local vs. global models for effort estimation and defect prediction. In ASE 2011, pages 343--351.

Digital Library

[17]

T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In ICSM 2012, pages 346--353.

[18]

R. Musson, J. Richards, D. Fisher, C. Bird, B. Bussone, and S. Ganguly. Leveraging the crowd: How 48,000 users helped improve lync performance. IEEE Software, 30(4):38--45, 2013.

Digital Library

[19]

S. Panichella, A. D. Sorbo, E. Guzman, C. A.Visaggio, G. Canfora, and H. C. Gall. How can i improve my app? Classifying user reviews for software maintenance and evolution. In ICSM 2015, pages 281--290.

Digital Library

[20]

P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53--65, 1987.

Digital Library

[21]

K.-J. Stol and B. Fitzgerald. Two's company, three's a crowd: A case study of crowdsourcing software development. In ICSE 2014, pages 187--198.

Digital Library

[22]

M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544--2588, 2010.

[23]

Y. Tian, D. Lo, and C. Sun. DRONE: Predicting priority of reported bugs by multi-factor analysis. In ICSM 2013, pages 200--209.

Digital Library

[24]

A. Tosun Misirli, B. Murphy, T. Zimmermann, and A. Basar Bener. An explanatory analysis on eclipse beta-release bugs through in-process metrics. In Proceedings of the 8th International Workshop on Software Quality (WoSQ 2011), pages 26--33, 2011.

Digital Library

[25]

B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17(1-2):62--74, 2012.

Digital Library

[26]

S. Wang, W. Zhang, and Q. Wang. FixerCache: Unsupervised caching active developers for diverse bug triage. In ESEM 2014, pages 25:1--25:10.

Digital Library

[27]

X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In ICSE 2008, pages 461--470.

Digital Library

[28]

M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer. Categorizing bugs with social networks: A case study on four open source software communities. In ICSE 2013, pages 1032--1041.

Digital Library

[29]

Y. Zhou, Y. Tong, R. Gu, and H. Gall. Combining text mining and data mining for bug report classification. In ICSM 2014, pages 311--320.

Digital Library

Cited By

Ling YYu SFang CPan GWang JLiu J(2025)Redefining crowdsourced test report prioritization: An innovative approach with large language modelInformation and Software Technology10.1016/j.infsof.2024.107629179(107629)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107629
Yu SFang CZhang QDu MLiu JChen Z(2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660776
Yu SFang CZhang QCao ZYun YCao ZMei KChen Z(2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
https://doi.org/10.1109/TSE.2023.3285787
Show More Cited By

Towards Effectively Test Report Classification to Assist Crowdsourced Testing
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Local-based active classification of test report to assist crowdsourced testing
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

In crowdsourced testing, an important task is to identify the test reports that actually reveal fault - true fault, from the large number of test reports submitted by crowd workers. Most existing approaches towards this problem utilized supervised ...
Test report prioritization to assist crowdsourced testing
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

In crowdsourced testing, users can be incentivized to perform testing tasks and report their results, and because crowdsourced workers are often paid per task, there is a financial incentive to complete tasks quickly rather than well. These reports of ...
Fuzzy Clustering of Crowdsourced Test Reports for Apps
Special Issue on Internetware and Devops and Regular Papers

DevOps is a new approach to drive a seamless Application (App) cycle from development to delivery. As a critical part to promote the successful implementation of DevOps, testing can significantly improve team productivity and reliably deliver user ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

September 2016

457 pages

ISBN:9781450344272

DOI:10.1145/2961111

General Chair:
Marcela Genero
University of Castilla-La Mancha, Spain
,
Program Chairs:
Andreas Jedlitschka
Fraunhofer IESE, Germany
,
Magne Jørgensen
Simula Research Laboratory, Norway
,
Giuseppe Scanniello
University of Basilicata, Italy
,
Sreedevi Sampath
University of Maryland Baltimore County, USA
,
Danilo Caivano
SER&Practices, Italy
,
Daniel Port
University of Hawaii, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

ESEM '16

Sponsor:

SIGSOFT

ESEM '16: ACM/IEEE 9th International Symposium on Empirical Software Engineering and Measurement

September 8 - 9, 2016

Ciudad Real, Spain

Acceptance Rates

ESEM '16 Paper Acceptance Rate 27 of 122 submissions, 22%;

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
429
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ling YYu SFang CPan GWang JLiu J(2025)Redefining crowdsourced test report prioritization: An innovative approach with large language modelInformation and Software Technology10.1016/j.infsof.2024.107629179(107629)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107629
Yu SFang CZhang QDu MLiu JChen Z(2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660776
Yu SFang CZhang QCao ZYun YCao ZMei KChen Z(2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
https://doi.org/10.1109/TSE.2023.3285787
Wu YTong YLiu AZhao LZhang X(2023)Mobile crowdsourced test report prioritization based on text and image understandingJournal of Software: Evolution and Process10.1002/smr.2541Online publication date: 9-Feb-2023
https://doi.org/10.1002/smr.2541
Wang JYang YWang SHu JWang Q(2022)Context- and Fairness-Aware In-Process Crowdworker RecommendationACM Transactions on Software Engineering and Methodology10.1145/348757131:3(1-31)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.1145/3487571
Li HQi XLi MQu YGe X(2022)Identifying High-impact Bug Reports with Imbalance Distribution by Instance Fuzzy EntropyInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402250053X32:09(1389-1417)Online publication date: 28-Sep-2022
https://doi.org/10.1142/S021819402250053X
Wang JYang YWang SChen CWang DWang Q(2022)Context-Aware Personalized Crowdtesting Task RecommendationIEEE Transactions on Software Engineering10.1109/TSE.2021.308117148:8(3131-3144)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TSE.2021.3081171
Long GChen TCosma G(2022)Multifaceted Hierarchical Report Identification for Non-Functional Bugs in Deep Learning Frameworks2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00041(289-298)Online publication date: Dec-2022
https://doi.org/10.1109/APSEC57359.2022.00041
Wu KHuang SShi YZhu JTang S(2022)Estimate the Precision of Defects Based on Reports Duplication in Crowdsourced TestingIEEE Access10.1109/ACCESS.2022.322793010(130415-130423)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3227930
Zhu PLi YLi TRen HSun X(2022)Advanced Crowdsourced Test Report Prioritization Based on Adaptive StrategyIEEE Access10.1109/ACCESS.2022.317608610(53522-53532)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3176086
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents