[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3416505.3423561acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

TraceSim: a method for calculating stack trace similarity

Published: 13 November 2020 Publication History

Abstract

Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient.
In this paper, we describe TraceSim — a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.

References

[1]
K. Bartz et al. 2008. Finding Similar Failures Using Callstack Similarity (SysML'08). USENIX Association, 1-6. http://dl.acm.org/citation.cfm?id= 1855895. 1855896
[2]
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. 2011. Algorithms for HyperParameter Optimization ( NIPS'11). 2546-2554.
[3]
J. Bergstra, D. Yamins, and D.D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures (ICML'13). JMLR.org, I-115-I-123.
[4]
M. Brodie et al. 2005. Quickly Finding Known Software Problems via Automated Symptom Matching (ICAC'05). 101-110. https://doi.org/10.1109/ICAC. 2005.49
[5]
J. C. Campbell, E. A. Santos, and A. Hindle. 2016. The Unreasonable Efectiveness of Traditional Information Retrieval in Crash Report Deduplication (MSR '16). ACM, 269-280. https://doi.org/10.1145/2901739.2901766
[6]
M. Claesen and B. De Moor. 2015. Hyperparameter search in machine learning. arXiv preprint arXiv:1502.02127 ( 2015 ).
[7]
Y. Dang, R. Wu, H. Zhang, D. Zhang, and P. Nobel. 2012. ReBucket: A Method for Clustering Duplicate Crash Reports Based on Call Stack Similarity (ICSE '12). IEEE Press, 1084-1093. http://dl.acm.org/citation.cfm?id= 2337223. 2337364
[8]
J. Deshmukh, K. M. Annervaz, S. Podder, S. Sengupta, and N. Dubash. 2017. Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques ( ICSME '17). 115-124. https://doi.org/10.1109/ICSME. 2017.69
[9]
T. Dhaliwal, F. Khomh, and Y. Zou. 2011. Classifying Field Crash Reports for Fixing Bugs: A Case Study of Mozilla Firefox (ICSM '11). IEEE Computer Society, 333-342. https://doi.org/10.1109/ICSM. 2011.6080800
[10]
T. Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (jun 2006 ), 861-874. https://doi.org/10.1016/j.patrec. 2005. 10.010
[11]
M. A. Ghafoor and J. H. Siddiqui. 2016. Cross Platform Bug Correlation Using Stack Traces ( FIT '16). 199-204.
[12]
K. Glerum et al. 2009. Debugging in the (Very) Large: Ten Years of Implementation and Experience (SOSP '09). ACM, 103-116. https://doi.org/10.1145/1629575. 1629586
[13]
A. Hindle and C. Onuczko. 2018. Preventing duplicate bug reports by continuously querying bug reports. Empirical Software Engineering (20 Aug 2018 ). https: //doi.org/10.1007/s10664-018-9643-4
[14]
S. Kim, T. Zimmermann, and N. Nagappan. 2011. Crash graphs: An aggregated view of multiple crashes to improve crash triage ( DSN '11). 486-493.
[15]
J. Lerch and M. Mezini. 2013. Finding Duplicates of Your Yet Unwritten Bug Report (CSMR '13). IEEE Comp. Soc., 69-78. https://doi.org/10.1109/CSMR. 2013.17
[16]
V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 ( 1966 ), 707-710.
[17]
C. X. Ling, J. Huang, and H. Zhang. [n.d.]. AUC: A Statistically Consistent and More Discriminating Measure Than Accuracy (IJCAI'03). 519-524. http://dl.acm.org/citation.cfm?id= 1630659. 1630736
[18]
C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.
[19]
N. Modani et al. 2007. Automatically Identifying Known Software Problems (ICDEW '07). IEEE Computer Society, 433-441. https://doi.org/10.1109/ICDEW. 2007.4401026
[20]
A. Moroo et al. 2017. Reranking-based Crash Report Deduplication. In SEKE '17, X. He (Ed.). 507-510. https://doi.org/10.18293/SEKE2017-135
[21]
S.B. Needleman and C.D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 ( 1970 ), 443-453. https://doi.org/10.1016/ 0022-2836 ( 70 ) 90057-4
[22]
M. S. Rakha et al. 2018. Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports. IEEE Trans. on Soft. Eng. 44, 12 (Dec 2018 ), 1245-1268. https://doi.org/10.1109/TSE. 2017.2755005
[23]
K. K. Sabor et al. 2017. DURFEX: A Feature Extraction Technique for Eficient Detection of Duplicate Bug Reports ( ICSQRS '17). 240-250.
[24]
A. Schroter et al. 2010. Do stack traces help developers fix bugs? ( MSR '10). 118-121. https://doi.org/10.1109/MSR. 2010.5463280
[25]
K. Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 ( 1972 ), 11-21.
[26]
C. Sun, D. Lo, S. Khoo, and J. Jiang. 2011. Towards more accurate retrieval of duplicate bug reports ( ASE '11). 253-262.
[27]
C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval ( ICSE '10). 45-54.
[28]
R. Wu et al. 2014. CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA '14). 204-214. https://doi.org/10.1145/2610384.2610386

Cited By

View all
  • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
  • (2024)DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient Near-Duplicate Crash Report DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639146(1-12)Online publication date: 20-May-2024
  • (2024)CrashTranslator: Automatically Reproducing Mobile Application Crashes Directly from Stack TraceProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623298(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MaLTeSQuE 2020: Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation
November 2020
36 pages
ISBN:9781450381246
DOI:10.1145/3416505
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic Crash Reporting
  2. Automatic Problem Reporting Tools
  3. Crash Report Deduplication
  4. Crash Reports
  5. Crash Stack
  6. Deduplication
  7. Duplicate Bug Report
  8. Duplicate Crash Report
  9. Information Retrieval
  10. Software Engineering
  11. Software Repositories
  12. Stack Trace

Qualifiers

  • Research-article

Conference

ESEC/FSE '20
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)2
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
  • (2024)DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient Near-Duplicate Crash Report DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639146(1-12)Online publication date: 20-May-2024
  • (2024)CrashTranslator: Automatically Reproducing Mobile Application Crashes Directly from Stack TraceProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623298(1-13)Online publication date: 20-May-2024
  • (2024)CrashChecker: A Fusion Method for Clustering Duplicate Crash Failures in SAP HANA Delivery2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00044(37-42)Online publication date: 28-Oct-2024
  • (2022)DeepCrash: deep metric learning for crash bucketing based on stack traceProceedings of the 6th International Workshop on Machine Learning Techniques for Software Quality Evaluation10.1145/3549034.3561179(29-34)Online publication date: 7-Nov-2022
  • (2022)SniPProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3528499(408-412)Online publication date: 23-May-2022
  • (2022)DeepAnalyzeProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3512759(549-560)Online publication date: 21-May-2022
  • (2022)Separating the Wheat from the Chaff: Using Indexing and Sub-Sequence Mining Techniques to Identify Related Crashes During Bug Triage2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00014(31-42)Online publication date: Dec-2022
  • (2022)Abaci-finder: Linux kernel crash classification through stack trace similarity learningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.06.003Online publication date: Jun-2022
  • (2021)S3M: Siamese Stack (Trace) Similarity Measure2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)10.1109/MSR52588.2021.00038(266-270)Online publication date: May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media