[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2961111.2962584acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Towards Effectively Test Report Classification to Assist Crowdsourced Testing

Published: 08 September 2016 Publication History

Abstract

Context: Automatic classification of crowdsourced test reports is important due to their tremendous sizes and large proportion of noises. Most existing approaches towards this problem focus on examining the performance of different machine learning or information retrieval techniques, and most are evaluated on open source dataset. However, our observation reveals that these approaches generate poor and unstable performances on real industrial crowdsourced testing data. We further analyze the deep reason and find that industrial data have significant local bias, which degrades existing approaches.
Goal: We aim at designing an approach to overcome the local bias in industrial data and automatically classifying true fault from the large amounts of crowdsourced reports.
Method: We propose a cluster-based classification approach, which first clusters similar reports together and then builds classifiers based on most similar clusters with ensemble method.
Results: Evaluation is conducted on 15,095 test reports of 35 industrial projects from Chinese largest crowdsourced testing platform and results are promising, with 0.89 precision and 0.97 recall on average. In addition, our approach improves the existing baselines by 17% - 63% in average precision and 15% - 61% in average recall.
Conclusions: Results imply that our approach can effectively discriminate true fault from large amounts of crowdsourced reports, which can reduce the effort required for manual inspection and facilitate project management in crowdsourced testing. To the best of our knowledge, this is the first work to address the test report classification problem in real industrial crowdsourced testing practice.

References

[1]
A systematic review of effect size in software engineering experiments. Information and Software Technology, 49(11-12):1073--1086, 2007.
[2]
A. Berson, S. Smith, and K. Thearling. An overview of data mining techniques. Building Data Mining Application for CRM, 2004.
[3]
M. Borg, D. Pfahl, and P. Runeson. Analyzing networks of issue reports. In CSMR 2013, pages 79--88.
[4]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[5]
N. Chen and S. Kim. Puzzle-based automatic testing: Bringing humans into the loop by solving puzzles. In ASE 2012, pages 140--149.
[6]
Y. Feng, Z. Chen, J. A. Jones, C. Fang, and B. Xu. Test report prioritization to assist crowdsourced testing. In ESEC/FSE 2015, pages 225--236.
[7]
H. Finch. Comparison of distance measures in cluster analysis with dichotomous data. Journal of Data Science, 3:85--100, 2005.
[8]
B. Ghotra, S. McIntosh, and A. E. Hassan. Revisiting the impact of classification techniques on the performance of defect prediction models. In ICSE 2015, pages 789--800.
[9]
M. Gómez, R. Rouvoy, B. Adams, and L. Seinturier. Reproducing context-sensitive crashes of mobile apps using crowdsourced monitoring. In Proceedings of the 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).
[10]
V. H. M. Gomide, P. A. Valle, J. O. Ferreira, J. R. G. Barbosa, A. F. da Rocha, and T. M. G. d. A. Barbosa. Affective crowdsourcing applied to usability testing. International Journal of Computer Science and Information Technologies, 5(1):575--579, 2014.
[11]
E. Guzman, M. El-Halaby, and B. Bruegge. Ensemble methods for app review classification: An approach for software evolution. In ASE 2015, pages 771--776.
[12]
S. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31:249--268, 2007.
[13]
W. Maalej and H. Nabil. Bug report, feature request, or simply praise? on automatically classifying app reviews. In RE 2015, pages 116--125.
[14]
K. Mao, Y. Yang, Q. Wang, Y. Jia, and M. Harman. Developer recommendation for crowdsourced software development tasks. In SOSE 2015, pages 347--356.
[15]
T. Menzies, A. Butcher, A. Marcus, D. Cok, F. Shull, B. Turhan, and T. Zimmermann. Local versus global lessons for defect prediction and effort estimation. IEEE Transactions on software engineering, 39(6):822--834, 2013.
[16]
T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok. Local vs. global models for effort estimation and defect prediction. In ASE 2011, pages 343--351.
[17]
T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In ICSM 2012, pages 346--353.
[18]
R. Musson, J. Richards, D. Fisher, C. Bird, B. Bussone, and S. Ganguly. Leveraging the crowd: How 48,000 users helped improve lync performance. IEEE Software, 30(4):38--45, 2013.
[19]
S. Panichella, A. D. Sorbo, E. Guzman, C. A.Visaggio, G. Canfora, and H. C. Gall. How can i improve my app? Classifying user reviews for software maintenance and evolution. In ICSM 2015, pages 281--290.
[20]
P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53--65, 1987.
[21]
K.-J. Stol and B. Fitzgerald. Two's company, three's a crowd: A case study of crowdsourcing software development. In ICSE 2014, pages 187--198.
[22]
M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544--2588, 2010.
[23]
Y. Tian, D. Lo, and C. Sun. DRONE: Predicting priority of reported bugs by multi-factor analysis. In ICSM 2013, pages 200--209.
[24]
A. Tosun Misirli, B. Murphy, T. Zimmermann, and A. Basar Bener. An explanatory analysis on eclipse beta-release bugs through in-process metrics. In Proceedings of the 8th International Workshop on Software Quality (WoSQ 2011), pages 26--33, 2011.
[25]
B. Turhan. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering, 17(1-2):62--74, 2012.
[26]
S. Wang, W. Zhang, and Q. Wang. FixerCache: Unsupervised caching active developers for diverse bug triage. In ESEM 2014, pages 25:1--25:10.
[27]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In ICSE 2008, pages 461--470.
[28]
M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer. Categorizing bugs with social networks: A case study on four open source software communities. In ICSE 2013, pages 1032--1041.
[29]
Y. Zhou, Y. Tong, R. Gu, and H. Gall. Combining text mining and data mining for bug report classification. In ICSM 2014, pages 311--320.

Cited By

View all
  • (2025)Redefining crowdsourced test report prioritization: An innovative approach with large language modelInformation and Software Technology10.1016/j.infsof.2024.107629179(107629)Online publication date: Mar-2025
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
September 2016
457 pages
ISBN:9781450344272
DOI:10.1145/2961111
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cluster
  2. Crowdsourced testing
  3. Report classification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ESEM '16
Sponsor:

Acceptance Rates

ESEM '16 Paper Acceptance Rate 27 of 122 submissions, 22%;
Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Redefining crowdsourced test report prioritization: An innovative approach with large language modelInformation and Software Technology10.1016/j.infsof.2024.107629179(107629)Online publication date: Mar-2025
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
  • (2023)Mobile crowdsourced test report prioritization based on text and image understandingJournal of Software: Evolution and Process10.1002/smr.2541Online publication date: 9-Feb-2023
  • (2022)Context- and Fairness-Aware In-Process Crowdworker RecommendationACM Transactions on Software Engineering and Methodology10.1145/348757131:3(1-31)Online publication date: 7-Mar-2022
  • (2022)Identifying High-impact Bug Reports with Imbalance Distribution by Instance Fuzzy EntropyInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402250053X32:09(1389-1417)Online publication date: 28-Sep-2022
  • (2022)Context-Aware Personalized Crowdtesting Task RecommendationIEEE Transactions on Software Engineering10.1109/TSE.2021.308117148:8(3131-3144)Online publication date: 1-Aug-2022
  • (2022)Multifaceted Hierarchical Report Identification for Non-Functional Bugs in Deep Learning Frameworks2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00041(289-298)Online publication date: Dec-2022
  • (2022)Estimate the Precision of Defects Based on Reports Duplication in Crowdsourced TestingIEEE Access10.1109/ACCESS.2022.322793010(130415-130423)Online publication date: 2022
  • (2022)Advanced Crowdsourced Test Report Prioritization Based on Adaptive StrategyIEEE Access10.1109/ACCESS.2022.317608610(53522-53532)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media